1 



THE MONTE CARLO METHOD 
IN QUANTUM FIELD THEORY 

COLIN MORNINGSTAR 

Department of Physics, Carnegie Mellon University, 
Pittsburgh, PA 15213, USA 
E-mail: colin^morningstar&cmu. edu 



This series of six lectures is an introduction to using the Monte Carlo method 
to carry out nonperturbative studies in quantum field theories. Path integrals 
in quantum field theory are reviewed, and their evaluation by the Monte Carlo 
method with Markov-chain based importance sampling is presented. Proper- 
ties of Markov chains are discussed in detail and several proofs are presented, 
culminating in the fundamental limit theorem for irreducible Markov chains. 
The example of a real scalar field theory is used to illustrate the Metropolis- 
Hastings method and to demonstrate the effectiveness of an action-preserving 
(microcanonical) local updating algorithm in reducing autocorrelations. The 
goal of these lectures is to provide the beginner with the basic skills needed to 
start carrying out Monte Carlo studies in quantum field theories, as well as to 
present the underlying theoretical foundations of the method. 
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1. Introduction 

Some of the most interesting features of quantum field theories, such as 
spontaneous symmetry breaking and bound states of particles, require com- 
putational treatments beyond ordinary perturbation theory. The Monte 
Carlo method using Markov-chain based importance sampling with a space- 
time lattice regulator is a powerful tool for carrying out such studies. One 
of the most prominent applications of such methods is hadron formation 
and quark confinement in quantum chromodynamics (QCD). This series of 
six lectures is an introduction to using the Monte Carlo method to carry 
out nonperturbative studies in quantum field theories. 

In Sec. 2, the path integral method in nonrelativistic quantum mechan- 
ics is briefly reviewed and illustrated using several simple examples: a free 
particle in one dimension, the one-dimensional infinite square well, a free 
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particle in one dimension with periodic boundary conditions, and the one- 
dinicnsional simple harmonic oscillator. The extraction of observables from 
correlation functions or vacuum expectation values is discussed, and the 
evaluation of these correlation functions using ratios of path integrals is de- 
scribed. The crucial trick of Wick rotating to imaginary time is introduced. 

The evaluation of path integrals in the imaginary time formalism us- 
ing the Monte Carlo method is discussed next in Sec. 3. After a brief re- 
view of probability theory in Sec. 3.1, simple Monte Carlo integration is 
described, and its justification by the law of large numbers and the cen- 
tral limit theorem is outlined. The need for clever importance sampling 
is then emphasized, leading to the use of stationary stochastic processes 
and the modification of the Monte Carlo method to take autocorrelations 
into account. Markov chains, one of the most convenient and useful of sta- 
tionary stochastic processes, are introduced and their properties are dis- 
cussed in detail in Sec. 3.4. This subsection is rather technical, containing 
a host of definitions, much mathematics, and many proofs of the properties 
of Markov chains, culminating in the fundamental limit theorem for irre- 
ducible Markov chains. The Metropolis-Hastings method of constructing 
a Markov chain appropriate to the path integral to be evaluated is then 
described. 

Monte Carlo evaluations of the path integrals needed for correlation 
functions in a one-dimcnsional simple harmonic oscillator arc presented in 
Sec. 4 as a first simple example, with particular attention paid to autocor- 
relations. Next, Sec. 5 is dedicated to Monte Carlo calculations in one of 
the simplest quantum field theories: a real scalar field in two spatial dimen- 
sions (three space-time dimensions). The theory is first formulated on a 
space-time lattice, then a simple Metropolis updating scheme is described. 
The Metropolis method is seen to be plagued by strong autocorrelations. 
An action-preserving (microcanonical) updating method is then described, 
and its effectiveness in reducing autocorrelations is demonstrated. Monte 
Carlo estimates in the free scalar field theory are compared with exactly 
known results, then a (j)^ interaction term is included in the action. This 
section introduces correlated-x^ fitting, as well as jackknife and bootstrap 
error estimates. 

There is insufficient time in these six introductory lectures to describe 
lattice QCD in any detail. Only very brief comments about lattice QCD 
are made in Sec. 6 before concluding remarks are given in Sec. 7. The goal 
of these lectures is to provide the beginner with the basic skills needed to 
start carrying out Monte Carlo studies in quantum field theories, as well as 
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to present the underlying theoretical foundations of the method. References 
for further reading are given at the end for those interesting in pursuing 
studies in lattice QCD. 

2. Path integrals in quantum mechanics 

2.1. Correlation functions and imaginary time 

Consider a small particle of mass m constrained to move only along the 
a;-axis. Its trajectory is described by its x location as a function of time, 
which we write as x{t). A key quantity in the quantum mechanics of such 
a system is the transition amplitude 

Z{b,a) = {Xb{tb) I Xa{ta)), 

where Z{b, a) is the probability amplitude for a particle to go from point 
Xa at time ta to point Xb at time t^. Here, we will work in the Heisenberg 
picture in which state vectors are stationary and operators and their 
eigenvectors evolve with time 

x{t) ^ e'"'/^ x(0) e-'"'/^, 

\x[t))=e^"'"-^\xm- 

We often will shift the Hamiltonian so the ground state energy is zero: 

H \^n{t)) ^ Er,\Mt)), Eo^O, 

\Mt)) = \M0)) ^ |o). 

The transition amplitude contains information about all energy levels and 
all wavefunctions, as can be seen from its spectral representation. Insert a 
complete and discrete set of Heisenberg-picture eigenstates \4'n{t)) of the 
Hamiltonian H into the transition amplitude, 

Z{b,a) = (Xbitb) I Xa{ta)) ^^(Xb{tb) \(l)n{tb)) ((l)n{tb)\ Xa{ta)), 

n 

then use |0„(t)) = e*-f^*/''|0„(O)) = e''-^"*/'*|0„(O)) to obtain 

Z(6,a) =^e-*^"(*^-*")/''(xh(4) IMtb)) {Mta)\ xM). 

n 

Since {x{t)\(j)n{t)) = fn{x) is the wavefunction in coordinate space of the 
n-th stationary state, one sees how the transition amplitude provides infor- 
mation about both the stationary state energies and their wavefunctions: 

Z{b,a) = ^^„(x,)<(x,) e-'^"(*'-*")/^ 
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Often, one is interested in evaluating the expectation value of observ- 
ables in the ground state, or vacuum. The above transition amplitude 
can yield this information by taking ta — and if, = T in the limit 
r (1 — ie)oo: 

OO 

^ (.Tb(0)|0)(0|x„(0)), 

which follows from inserting a complete set of energy eigenstates, using 
En+i > En, Eq = 0, and assuming a nondegenerate vacuum. This vacuum 
saturation trick allows the possibility of probing ground state (vacuum) 
properties. Now apply the limit T ^ (1 — ie)oo to a more complicated 
amplitude 

{xh{T)\x{t2)x{h)\Xa{~T)) 

= (xfc(0)|e-'^^/^ x{t2)x{h) e-'^^/^|a;a(0)) 

= V(xfc(O)|0„(O))(0„(O)|.r(t2)x(ti)|(/)„,(O))(0„,(O)|a.a(O)) 

^ {xt,mO){0\xit2)xih)\0){0\Xam. 

Hence, the vacuum expectation value of x(t2)x{ti) is obtained from 

/nl (, \ a Mn\ r {xbiT)\xih)xih)\xai-T)) 
{0\x{t2)x{ti}\0) ^ hm — ■ 

This result generalizes to higher products of the position operator. 

A key point to keep in mind is that all observables can be extracted 
from the correlation functions (vacuum expectation values) of the position 
operator x{t). For example, the energies of the stationary states can be 
obtained from 

{0\x{t)x{0)\0) = (0|e^^*/'"'x(0)e"*^*/'^x(0)|0) 

-5](O|x(O)e-»«*/''|0„(O))(0„(O)|x(O)|O) 

n 

^^|(O|x(O)|0„(O))pe-^"*/^ 

n 

and similarly for more complicated correlation functions: 

{0\x^{t)x^{0)\0) = (0|e'^*/V(0)e^*^*/V(0)|0) 
= ^|(O|x2(O)|0„(O))|2e-^"*/^ 
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But it is difficult to extract the energies from such oscillatory functions. 
It would be much easier if we had decaying exponentials. We can get de- 
caying exponentials if we rotate from the real to the imaginary axis in time 
(Wick rotation) t —it 

{Q\x{t)x{Qm = ^|(0|.T(0)|^„(0))pe-^"^/'-' 

n 

|(0|x(0)|0)|^ + |(O|x(O)|0l(O))|2e-^^^/^ 

Later, we will see that this imaginary time formalism provides another 
important advantage for Monte Carlo applications. 




2.2. Path integrals 

The evaluation of the quantum- mechanical transition 
amplitude can be accomplished in several ways. In 
the 1940s, Richard Feynman developed an alterna- 
tive formulation^ of quantum mechanics as the topic 
of his Ph.D. thesis. In his formulation, the quantum 
mechanical law of motion expresses the transition 
amplitude as a sum over histories or a path integral: 

Z{b,a)r^ J2 cxp{iS[x{t)]/n). 

all paths x{t) 
from a to b 

All paths contribute to the probability amplitude, 
but with different phases determined by the action S'[a:;(t)]. Evaluating 
the transition amplitude in this formalism requires computing a multi- 
dimensional integral, but no differential equations need to be solved and 
no large matrices need to be diagonalized. His approach also has a con- 
ceptual advantage: the classical limit clearly emerges when small changes 
in the path yield changes in the action large compared to h, causing the 
phases to cancel out so that only the path of least action SS — dominates 
the sum over histories. 

For a single particle constrained to move only along the x-axis, the 
action, being the time integral of the Lagrangian (kinetic minus potential 
energy), is given by 

S = Jdt L{x, x) = jdt {k - . 

To define the path integral needed to evaluate the transition amplitude, 
one first divides time into small steps of width e, where Ne ~ tb — ta for 
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Fig. 1. A typical path in the path integral for a nonrclativistic particle moving in one 
dimension. 



large integer N. The path integral is defined as 



1 r 

Z(b, a) = lim — / 

W-too A J_, 



dx^dx2 dxN-1 ^S[x{t)]/h 

A A A 



where A is a normalization factor depending on s = {tb — ta) / N and chosen 
so that the path integral is well-defined (see later). In a nonrclativistic 
theory, paths cannot double-back in time, so a typical path looks like the 
one shown in Fig. 1. 



2.3. Relationship to the Schrddinger equation 

It is interesting to show how the above expression is equivalent to the 
familiar Schrddinger equation. The probability amplitude ip{xb,tb) at time 
tb, assuming an amplitude i^i^ai ta) at a-n earlier time ta, is given by 

1p{xb,tb) = / Z{b,a) 'lp{Xa,ta) dXa- 



Take ta = t and tb = t + e one time slice away, then 

1 f°° \ie^fxb+Xa Xb-Xa 



ll}{xb,t + £) 



1p{Xa,t) dXa, 
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where in L(x,x), the speed is i = {xb—Xa)/s and the mid-point prescription 
X — *■ {xb+Xa)/^ is used. If the particle is subject to a potential energy, then 
L = ^mx^ — V{x,t), and it is convenient to write Xb = x, Xa = x + r] so 
that 

t + e) = - / e'"'vVi2he)^-^eVix+n/2.t)/h^^^ ^ ^) 

The rapid oscillation of e«™')^/(2fe) except when 77 ^ leads to the 

fact that the integral is dominated by contributions from 77 having values 
of this order. Given this, we can expand all expressions to 0{e) and 0{rf')^ 
except e*™'' /^"^^^^ [tp refers to ilj{x,t)), yielding 

is dip jf d^ip 



^im,rr/{2he 



V'- -^yix,t)lp + 7]- 



h ' ' ' dx 2 dx^. 

Matching the leading terms on both sides determines A (using analytic 
continuation to evaluate the integral): 

Given the following integrals. 



oo 



then the 0(e) part of the equation yields 
h dip d^ip 

This is the Schrodinger equation! 

2.4. Example: free particle in one dimension 

Now let us explicitly evaluate the path integrals for several simple examples. 
First, consider a free particle of mass m in one dimension. The Lagrangian 
of a free particle in one dimension is 

L ~ —mx^, 
2 

so the amplitude for the particle to travel from Xa at time ta to location Xb 
at later time tb is 

{xb{h)\xa{ta)) ^ / X'.T(t)exp(iS'[6, a]/?i). 
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summing over all allowed paths with x{ta) ~ Xa and x{tb) = Xb- The 
classical path Xci(t) is obtained from SS = and boundary conditions: 

(t-ta) 



Xclit) = 0, Xcl{t) = Xa + {Xb - Xa) 

and the classical action is 

Sci[b,a] = j dt ^mx^^ 



{tb - ta) ' 
m{xb - Xa)"^ 



JU ' 2{tb-ta) 

Write x{t) = Xci{t) + x{t) with x(ia) = x(ih) = 0, then 
S[b,a] = 5ci[6,a] + / dt imx^ 



where S'ci[6, a\ is the classical action. Notice that there are no terms linear 
in x{t) since Sc\ is an extremum. The transition amplitude becomes 

Z(6,a) = F(T)cxp(i^ei/fi), 



where T ^ tb — ta- Partition time into discrete steps of length e, use the 
midpoint prescription, and note that xo = Xn = 0: 



Jo 
dtx^ 



n 

V 1=1 



dxi 
A 



A 



1/2 



N-1 



dtx^ = - "^{Xj+i-Xjf, 
•^0 ^ j=o 



A multivariate Gaussian integral remains: 



\2he 



XjMjkXk 



F(T) 



N/2 



where M is a symmetric {N — 1) x (A^ — 1) matrix 



2he 



XjMjkXk 



M = 



2-1 • ■ 
-1 2-1 O-' 
0-1 2-1-. 



9 



Gaussian integrals of a symmetric matrix A are easily evaluated, 



dot A 



so the result for F{T) is 



FiT) 



1/2 



.27rifecletM/ 

We now need to compute det(Af). Consider an n x n matrix _B„ of form 



/ 2h-b 0---\ 
-b 2b-b • • • 
0-b 2b -b--- 



\ 



J 



Notice that 



det Bn = 2b det B^-i + b det 



V 



-6 


-60 •••\ 









Bn-2 1 



= 26det_B„_i — b" dctBn-2- 
Define /„ = det i?„ . then we have the recursion relation 

In+l = 2bl„ - bHn-U I-l =0, /o = 1, n = 0, 1, 2, , 
Rewrite In+i = 2&/„ — b^In-i, I~i =0, /g = 1 as 



In+l 
In 



f2b-b-'\ ( /„ 

^ 1 y 



2b -b^ 
1 



It is then straightforward to show that 
f2b^b^Y 



so that 



V 1 



In+l 
I. 



nb" 



-{n-l)b" J ' 



{n + l)b" -nb"+^ 



nb 



n-l 



-{n- 1)6" y V 1 / ' 



26 



and thus, /„ = det i?„ = (n+l)6". Here, 6=1 and n = A^— 1 so det AI = N, 
and using Ne = ti, — ta, we obtain 

1/2 
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The final result for the transition amplitude for a free particle in one di- 
mension is 

2.5. Infinite square well 

As a second example, consider one of the first systems usually studied when 
learning quantum mechanics: the infinite square well. This is a particle 
moving in one dimension under the influence of a potential given by 

, f br < X < L, 

[ cx) for X < and x > L. 

The path integral for the transition amplitude in this case is given by 

= i i ^ • • • i ^ { S g (^^+1 I ' 

where the paths are limited to < x < L. Gaussian integrals over bounded 
domains produce error functions (erf), making direct evaluation in closed 
form difficult. A simple trick^ to evaluate the path integral in this case is 
to extend the regions of integration to — oo < x < oo, but subtract off all 
forbidden paths. In so doing, we express the square well amplitude as an 
infinite sum of free particle amplitudes. 

To help describe these path cancellations, let us refer to unbounded 
paths which can visit any value of x as free paths, and we shall refer to 
paths which never cross an a; = nL boundary, for integer n, as confined 
paths. The set Sc of all confined paths from Xa at time ta to Xb at later 
time tb is given by the set Sp of all free paths from Xa to Xb, excluding all 
free paths which cross the x = oi x = L boundary at least once. The set 
of free paths which cross the x = or x = L boundary at least once can be 
partitioned into two non-intersecting subsets: the set Sil of paths whose 
last boundary crossing occurs at the x = boundary at time ti, and the set 
Sir of paths whose last boundary crossing occurs at the x = L boundary 
at time ti , for all possible values of ti . For a particular ti , each subset is 
the set of all free paths from Xa at ia to x = (or x ~ L) at time ti with all 
confined paths from x ~ (or a; = L) at ti to Xb at tb- The set expression 
Sc = Sp — Sir — SiL is illustrated graphically in the top row of Fig. 2. 
In this figure, the solid lines represent all confined paths between the end 
points of the line (those that do not cross an nL boundary, for integer n). 
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-IL -L l>x„L 21 3L -2L -L Ox„L 2L 3L -2L -L Ox„L 2L 3L -2L -L Ox„L 2L 3L 






2L -L Ox^L 2L 3L -2L -L Ox^L 2L 3L '° -2L -L X„L 2L 3L -2L -L Ox„L 2L 3L '" 




I 



-2L -L ax„L 2L 3L -2L -L U X„L 2L 3L -2L -L Ox.L 2L 3L -2L -L Dx.L 2L 3L 



Fig. 2. Expressing the set of confined paths (solid fines) of the infinite square weU in 
terms of sets of free paths (dashed lines). The minus signs are set difference operators. 
Details are described in the text. The circles indicate the applications of reflections which 
preserve the free-particle action. 



and the dashed lines represent all free paths between the end points of the 
line. Remember that there is no doubling back in time. The minus signs 
are set difference operators. 

Next, consider a particular path in set ^il. The section of the path 
from .T = at ti to Xb at can be reflected x —x without changing the 
free-particle action. This is because the free-particle Lagrangian depends 
only on the square of the speed which is left unchanged except at a finite 
number of points, a set of measure zero. Hence, as far as the path integral 
is concerned, the set Sil can be replaced by the set S[]^ of all free paths 
from Xa at to to x = at ti with all confined paths from a; = at to 
—Xb at tb, for all possible ti. With a little thought, one can see that S'^j^ is 
the set SpiL of all free paths from Xa at ta to —Xb at tb^ excluding the set 
S2L of all free paths from Xa at ta to x = —L at ti with all confined paths 
from X = —L at ti to —Xb at tb- The set expression 5*^^ = Sfil — S2L is 
illustrated in the second row of Fig. 2. 

Consider a particular path in the set Sm- The section of the path from 
X = L at ti to Xb at tb can be reflected x — *■ 2L — x without changing the 
free-particle action, so Sir can be replaced by the set S[j^ of all free paths 
from Xa at ta to X ^ L at ti with all confined paths from x = L sd ti to 
2L — Xb at tb, for all possible ti. Again, it is not difficult to see that S[ji 
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is the set Sfir of all free paths from Xa at ta to 2L — Xb at tb, excluding 
the set S2R of all free paths from Xa at ta to x = 2L at ti with all confined 
paths from x = 2L at ti to 2L — Xb at tb, as illustrated in the third row of 
Fig. 2. 

This procedure can be iterated again and again until one obtains the 
final result as a sum of free propagators to an infinite number of mirror 
points: 

{^b : tb\-^a T ta) conf (-^b 1 ^fc |*^a 1 ^a) free 

( -^b 1 tb \ -^a 7 ^a) free (2Z/ Xb , tb \ Xa 5 ^a)free 
+ (-2L + Xb, tb\Xa,ta)hcc + (2i + Xb, tb\Xa , ta) hoc H , 

00 

= E {^C^nL + Xb,tb\Xa,ta)hcc - C^nL - Xb,tb\Xa,ta)frccy 
n— — 00 

Substituting the amplitude for a free particle into this expression yields 

1/2 

{xb{tb)\Xa{ta)) cont 



2TTih{tb - ta) 

E°° / ( im{2nL+Xb — Xa)^ \ ( im(2nL — Xb — Xa)^ \ 

„rn 2Htb-ta) r 2.(4-0 I 

Apply Poisson summation and integrate the Gaussian 

n— — 00 j — — oo ^ 

/ ds expl —las ± ips ] = \ — exp I , 

to finally obtain the spectral representation of the transition amplitude for 
an infinite square well: 

00 

{Xb{tb)\xa{ta))^cn = E ^n(^^>X(^a)e-^^"(*^-*")/^ 
ri=l 

ri^n^h^ [2 . /'mTX\ 

^-^^^^ ^"(^) = VZ'"H— J- 

The familiar energy levels and wavefunctions have been obtained using only 
path integrals. 

2.6. Free particle in ID periodic box 

For our third example, consider a particle moving in one dimension with 
periodic boundary conditions at x = and x = L. Once again, directly 
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2L -L Ox„L 2L 3L -2L -L Ox„L 2L 3L 



Fig. 3. Each path in the periodic box is equivalent to a free path leading to an appro- 
priate mirror point. The equivalent free path is found by horizontally translating sections 
of the periodic path to form a continuous free path. 



enforcing boundary conditions on the path integrals is difficult, so it is best 
to proceed by using a trick similar to that used for the infinite square well, 
that is, to express the set of allowed paths in terms of an equivalent set 
of unrestricted paths. Each path in the periodic box is equivalent to a free 
path leading to an appropriate mirror point. The equivalent free path is 
found by horizontally translating sections of the periodic path to form a 
continuous free path, as shown in Fig. 3. The resulting amplitude is a sum 
of free amplitudes to an infinite number of mirror points: 

oo 

(^6 , ^6 1 1 ^a) periodic — ^ ^ {•^b ^ I'^a ; ^a) free ■ 

n— — oo 

Substitute the amplitude for a free particle, 

1/2 



{xb{tb)\Xa{ta)) 



cxp' 

n——oo 



2T:ih{tb - ta) J ' \ 2h{tb~ta) 

apply Poisson summation, and integrate the Gaussian, 

oo go /.oo 

ds expf — ias^ ± iBs] — \ — exp | 



n— — ao 
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to obtain the spectral representation of the transition amplitude: 

oo 

{Xb{tb)\Xa{ta)) = V'n(^&X(a;a)e-*^"(*^-*"'/^ 

n——oo 

" ~ 2m' — L ' — ^jjy' 

The quantization of the momentum, and the familiar energy levels and 
wavefunctions have once again emerged using only path integrals. 



2.7. Simple harmonic oscillator in ID 

The one-dimensional simple harmonic oscillator is our last example. The 
kinetic and potential energies of a simple harmonic oscillator of mass m 
and frequency lo are given by 

K = imi^, U ~ ^muj^x^, 

so the action is 

dt (imi^ - ^rauj'^x'^) . 
The classical equations of motion are 

SS = Q Xc\+ LO^Xcl = 0, 

and the value of action for the classical path is 

{xl + xl) COs(wr) - 2XaXb , 

where T — ti, — ta- To calculate the amplitude Z{b,a) = {xb{tb)\xa{ta)) sho, 
write the path as a deviation from the classical path: 

x{t)=xci{t) + x{t), X{ta)^x{tb)^0. 
The amplitude can then be written as 

Z{b,a) = F{T)cxp{iSci/h), 
FiT) = J^x exp I £dt ix' - u^'x') 



Sc 



muj 



2 sin(wT) 
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Partition time into discrete steps of length e and use a midpoint prescrip- 
tion: 



'Dx= 



n 

V 1=1 



dXi 
A 



2nihs 



1/2 



N-1 r 



N/2 



2 £ ^ / , \2 



W-1 



-(Xj+i+Xj) 



— 



, 1=1 



A multivariate Gaussian integral remains: 



F{T) 



2'Kifie 



N/2 



where M is a symmetric {N — 1) x (A^ — 1) matrix 





" 2 


-1 










"2 1 00 


M = 


-1 


2 


-1 


... 




1 2 1 ••• 





-1 


2 


-1 • • • 


4 


1 2 1 ••• 



Such Gaussian integrals are easily evaluated: 

m \ 1/2 



F{T) ^ (: 



.27rifedetM/ 

Now wc must compute detil/. Consider det(_B„), where the n x n matrix 
Br, has the form 



/a 6 • - A 
6 a 6 • • 
5 a 6 • • 



V 
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B„ matches M for n = iV - 1, a = 2(1 - e^uj'^/'i), and 6 = -(1 + e^oj'^/'i). 
Notice that 



det Bn ~ a det B^-i — b det 





(b 


50 


6det 





Bn 




Is 






&2 det Bn- 


-2- 
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Define /„ = det _B„ to obtain the recursion relation 

In+i = ain - 6^/,i-i, I-i =0, /o = 1, n = 0, 1, 2, 
Rewrite this recursion relation as 

'In+l\ ^ f'^-b'] f In \^ fa-b^y flA 
In J \l J \In-lJ \1 J \loJ ' 

diagonalize as follows. 

'a-fe2\ _ /A+ 
1 j " I A_ 



-(a± Vo 



A± = - (a± Va2 -4fe2 



then we have 



Thus, 



11/' A, - A_ V -1 A 



detB„-^ -— , (A+^A_). 



Using A± = 1 ± iLue + 0{e ) yields 



lim e det M = lim e 



— ({l + iujef-il-iLoef), 



1 // iuTV" ( looT^"" 

= lim 1 + - 1 

^Zl '^^^\\ N J y N ^ 

The final result for the path integral is 

{xb{tb)\xa{ta))sho = - .. . 7^^,^ — TTT cxp |iS'cl/?l| . 

Consider the temporal evolution of a Gaussian wave packet for this 
system. If the probability distribution corresponding to the initial wave 
packet at time ta = is a Gaussian: 



cr\/27r V 2fT 




-10 1-10 1-10 1-10 1 
X (au) X (au) x (au) x (au) 

Fig. 4. Time evolution of a Gaussian wave packet for a simple harmonic oscillator of 
mass m = Ig/mol = 1.66 X 10~^^kg and frequency i^; = 3 X lO'^'^radians/sec. The initial 
wave packet at t = is centered at 0.5 au and has initial width a =0.14 au. Note that 1 au 
(atomic unit) = 0.529 angstrom. The dot indicates the location of a classical oscillator 
of the same mass and frequency. The probabilities are shown for times t = nT/8 for 
n = 0, 1, 2, . . . , 7, where T is the period. 



then the probabihty amphtude at a later time t), is 

/oo 
dxa Z{b, a) (j){Xa, 0), 
-oo 



-imioy 



ha sin(cjth) 



so the probabihty distribution remains a Gaussian but with a varying width 
s{t): 

{xb — X cos{ujti,))'^ " 



1 



where the width is given by 



exp 



2s^{tb) 



s(4) = cr <{ cos^(lj4) + ^^^2^2q-4 sin^Mfc) 
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The time evolution of such a Gaussian wave packet for a simple harmonic 
oscillator is shown in Fig. 4. Note that this evolution was completely cal- 
culated above using path integrals; the Schrodinger equation was not used. 



2.8. Correlation functions and observables 

We have so far seen that path integrals give us simple transition amplitudes, 
such as 

{xb{tb)\xa{ta)) = J Vx expj^^ dtL{x,x)^, 
but this important result generalizes to more complicated amplitudes: 

{Xb{tb)\ X(t2) x(ti) \Xaita)), 

Vx x(t2)x{ti) exp|— j dt L{x,x)> , 

for ta < ti < t2 < tb- In the imaginary time formalism, paths contribute to 
the sum over histories with real exponential weights (not phases): 

{Xb{Tb)\ xIt2) x{ti) \Xa{Ta)) 

= / Vx x(t2)x(ti) exp<— — / dr L(x,x) 

Now the classical path gets the highest weighting. Note that weights are all 
real and positive since the action is real. This fact will be crucial for the 
Monte Carlo method. 

Another important fact is that correlation functions (vacuum expecta- 
tion values) can be obtained from ratios of path integrals. For example, a 
two-point function can be obtained from 

/nl <i \ a r MT)\x{t2)xit,)\xai-T)) 
mt2}x{t,m = ^ln:a^ {xb{T)\x4-T)) ' 



/ Vxx{t2)x{ti) 

J a 


exp 1 




' dTL{x, x) > 

— oo J 


J 


f Vx exp j 




/ dTL{x^ x) 

' — oo j 





and more complicated correlation functions can similarly be obtained. In 
fact, any correlation function can be computed using path integrals. 
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For example, consider the simple harmonic oscillator. Evaluating path 
integrals as before, the following correlation functions can be obtained: 

(0|x(ti)|0) = 0, 
(0|x(r2)x(ri)|0) = ^e--(^-^^), 



2 



where ti < T2 < T3 < T4. Comparison with the spectral representations 
tells us 

h 



i|x(t)x(0)|0) = e-^^ Ei-Eo = hu, |(l|a;(0)|0)|^ = 



2mti> 2mLj 
As another example in the SHO case, consider exciting the vacuum with 
the a;(r)^ operator: 

(0|x2(r)x2(0)|0)= (^^)'(^l + 2e-2-- 

Compare with the spectral representation at large time separations, 
lim (0|x2(r)x2(0)|0) = |(0|a;2(0)|0)p + |(2|a;2(0)|0)|2 g-^^^"^")*/'^ + 



1 + 2e 



-2UJT 



\2muj ^ 

to arrive at the following interpretation: 

E2- Eq^ 2hiu, 

One last example in the SHO: to determine the expectation value of x(0)^ 
in first-excited state, evaluate 



x(r) x\^t) x{0)\0) = 3 



^ 2muj ^ 

and compare with its spectral interpretation at large times: 
lim {Q\x{t)x^{\t)x{Q)\Q) 

r— >oo 

- |(0|x(0)|l)|2(l|2;2(0)|l) e-(^i-^")"/'"' + -- - , 

since (0|a;(0)|0) — (0|a;(r)|0) — 0. By inspection and using previously de- 
rived results, one concludes that 

{i\x\m 



2mu} 
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Now pause for reflection. We have seen that observ- 
ables in quantum mechanics can be extracted from corre- 
lation functions (vacuum expectation values), and that the 
imaginary time formalism is a great trick for assisting in 
such extractions. Correlation functions can be computed 
via ratios of path integrals 



(0|:r(<2).x(ti)|0) 



J 


f Dx x{t2)x{ti) exp 1 




i dTL{x^ x) > 

— oc J 


J 


f Dx exp j 




/ drL{x, x) 





3. Monte Carlo integration and Markov chains 

In rare situations, the path integrals in transition amplitudes can be com- 
puted exactly, such as the simple harmonic oscillator and a free particle. 
Sometimes the action can be written S ~ Si^ + gSj, where 5*0 describes 
the free motion of the particles and Si describes the interactions of the 
particles, but the coupling g is small. Typically, the path integrals using 
5*0 are Gaussian and can be exactly computed, and the interactions can be 
taken into account using perturbation theory as an expansion in g. How- 
ever, if the interactions are not weak, such as in quantum chromodynamics, 
one must somehow numerically evaluate the needed path integrals with 
powerful computers. 

The trapezoidal rule and Simpson's rule are not feasible for integrals of 
very large dimension. These methods require far too many function evalua- 
tions. One of the most productive ways of proceeding is to start gamblingl 
The Monte Carlo method comes to our rescue. The basic theorem of Monte 
Carlo integration is 



fix)d^x^V{f)±vJ^^'^ ^^^^ 



V V iV ' 

N N 

i=i 1=1 

where the N points xi, . . . , xn are chosen independently and randomly with 
a uniform probability distribution throughout the Z?-dimensional volume V. 
The method is justified by the law of large numbers and the central limit 
theorem. In the limit ^ oo, the above Monte Carlo estimate tends to a 
normal distribution and the uncertainty tends to a standard deviation. 
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The above method sounds too good to be true. Ahhough the above 
method should work in principle, it is impractical for evaluating quantum 
mechanical path integrals, unless suitably modified to incorporate impor- 
tance sampling. Before discussing this, a closer look at the simple Monte 
Carlo method is warranted, and this can be facilitated with a quick review 
of probability theory. 



3.1. Quick review of probabilities 

Consider an experiment whose outcome depends on chance. Represent an 
outcome by X called a random variable, and the sample space fl of the 
experiment is the set of all possible outcomes. X is called discrete if O 
is finite or countably infinite, and continuous otherwise. The probability 
distribution for discrete X is a real-valued function px on the domain 
satisfying px{x) > for all x G ^ and J^xenP^i'-'^) " ^- ^'^^ subset E 
of n, the probability of E is P{E) ~ J2xeEPx(^)- ^ sequence of random 
variables Xi, X2, . . . , Xn that are mutually independent and have the same 
probability distribution is called an independent trials process. 

For a continuous real- valued X , the real- valued function px is a prob- 
ability density and the probability of an outcome between real values a 
and b is P{a < X < b) = pxis)ds. The cumulative distribution is 
Fx{x) = P{X < x) ~ J^^px{s)ds. A common probability density is the 

normal distribution pxix) = ^ g-i^-p-f /i'^<^^) _ 

y 27r(7 

The expected value of X is 

E{X)^J2^P^(^) / sPxis)ds 

The expected value satisfies E{X + Y) = E{X) + ElY) and E{cX) = 
cE{X), and for independent random variables X,Y one has E{XY) = 
E{X)E{Y). One can show that E{X) is the average of outcomes if repeated 
many times. For a continuous real- valued function /, one can also show that 



EifiX)) = J2 fi^) Px{x) = / /(s) px{s) ds 

xen ^ 

To see this, group together terms in J2xfix)Pxix) having same f{x) value. 
Denote the set of different f{x) values by JF, and the subset of fl leading 
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to same value of f{x) by ^f{x): then 

The variance of X is V{X) = E{ {X ~ E{X) f ), and the standard 
deviation of X is cr(X) = -^/V^X). The variance satisfies V{cX) ~ c^V{X) 
and V{X + c) = V{X). and for independent random variables X, Y, one has 
V{X+Y) ~ V{X)+V{Y). Let Xi, . . . ,Xn be an independent trials process 
with E{Xj) = /i and V{Xj) = a^, and define An = {Xi+X^^- ■ ■+Xn)/N, 
then one can easily show 

E{An) - V{An) = (J^N. 

An important theorem in probability and statistics is known as Cheby- 
shev's inequality: Let X be a random variable (discrete or continuous) 
with E{X) = /i and let e > be any positive real number, then 

Proof. Let px{x) denote the probability distribution oi X, then the prob- 
ability that X differs from fi by at least e is 

P{\X-^i\>e)= E px{x). 

la; — >e 

Considering the ranges of summation and that we have positive summands, 

^ \x — fj.\'>€ \x—fi\'>e 

but the rightmost expression is 

J2 Px{x)^e'P{\X-f,\>e). 

|a:— >e 

Thus, we have shown F(a;) > e^FdX - ^1 > e). □ 

An important consequence of Chebyshev's inequality is the weak law 
of large numbers : Let Xi, X2, ■ ■ ■ , Xn be an independent trials process 
with E{Xj) ~ /i and V(Xj) = a^, where are finite, and let A^ = 
(Xi + + • • • + Xn)/N. Then for any e > 0, 

lim P{\An - mI > e) = 0, lim P{\An - fi\ < e) = 1. 
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Proof. We previously stated that E{An) ~ ^ and V{An) = a^/N, and 
from the Chebyshev inequality, 

PHAn ~^^\>e)< = ^ 0- □ 

This is also known as the law of averages, and applies to continuous random 
variables as well. 

A different version of the above law is known as the strong law of 
large numbers: Let Xi,X2, ■ ■ ■ , be an independent trials process with 
E{Xj) = ji and V{Xj) ~ cr^, where /i, a are finite, then 

P f lim (Xi + ^2 + • • • + Xn)/N = 1^1=1. 

Proof. We shall assume that the random variables Xj have a finite fourth 
moment E{Xj) — K < oo. The finiteness of E{Xj) is not needed, but 
simplifies the proof. For a proof without this assumption, sec Ref. 3. 

Define Y, = Xj - so E{Yj) = 0. Define B = E{Y^^) < oo and C = 
E{Yf) < oo, then define An = {Yi + Y2 + ■ ■ ■ + Yn)/N. Consider 

N^E{A%) = £;( (Yi + ^2 + • • • + Yn)^ ). 

Expanding Aj^ yields terms of the form F/, Y-^Yj, Y^Y^, Y^YjYk, and 
YiYjYkYi, where i,j,k,l are all different. Given that E{Yj) = and all Yj 
are independent, then 

E{Y[^Yj) = E{Y^^)E{Yj) = 0, k, I aU different) 

E{Y^Y,Yk) = E{Y^')E{Y,)E{Yk) ^ 0, 
E{YiY,YkYi) = E{Yi)E{Y,)E{Yk)E{Yi) = 0, 
E{Y^Y^) = E{Y^)E{Yf) ^ B\ 

Since the random variables are identically distributed, then E{Y^) and 
EiY^Yj') are independent of i, j, so we have 

N^E{A%) = NE{Y^) + 6(^)£;(K,2r/) = NC + 3N{N - 1)B\ 

Since < V{Yf) = E{{Y^ - E{Yff) = E{Yf) - E{Yff then B"^ = 
EiYff < E{Yf) = C so E{A%) < C/N'^ + 3C/N^, which means 

/ 00 \ 00 00 ^ „ Q/^X 



yN=l J N=l N=l 



This implies X]w=i ^ with unit probability, and convergence of this 

series implies limAr_>oo Aj^ = 0, which means that lim7v_too An ^ 0. □ 
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This proves that E{X) is the average of outcomes for many repetitions. 

Uncertainties in Monte Carlo estimates depend upon the celebrated 
central limit theorem: Let Xi, X2, . . . ,Xn be independent random vari- 
ables with common distribution having E{Xj) = n and V{Xj) = cr^, where 
^, a arc finite, and let Ajv = (^1 + X2 + • • • + Xn) /N. Then for a < b, 



Alternatively, the distribution of {Xi + • • • + Xn — N ji) / {a\/ N) tends to 
the standard normal (zero mean, unit variance). 



Proof. Define Sn = N{An ~ m)/ (fv N), and if i is a real- valued parame- 
ter, then 

£;(e*^") ^ E 

= E 



^t{Xi-f,)/{aVN)^t{X2-fJ^)/{aVN) . . . gt(^iv-/^)/(<TVlV) 
^t{X,-|,)/(cr^/N}^ ^ ^t{X2-|,)/{cr^/N}^ . . . _g ^t{X - / {a^/N) 



E e 



,t(A-i-M)/('^VlV)' ^ 



where, in the last two steps above, we used, respectively, the facts that the 
Xj are independent and are identically distributed. Now carry out a Taylor 
series expansion about t = 0: 

^ Ltix,-,yi.^,\ ^ ^ A ^ ^ t^iX^-j^ ^ 

From this, one sees that 

lim E fe*(^i-M)/(-Vlv)\ = f 1 + ^ + . . . ] = gtV2_ 

This is the moment generating function of the standardized normal distri- 
bution. The moment generating function of a random variable X is defined 
by Mx{t) = E{e*^). If X and Y are random variables having moment gen- 
erating functions Mx (t) and My (t) , respectively, then there is a uniqueness 
theorem that states that X and Y have the same probability distribution if 
and only if Mx{t) = Afy(t) identically. The use of this theorem completes 
the proof of the central limit theorem. □ 
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3.2. Simple Monte Carlo integration 

Recall that for a continuous real- valued function f{X) of a continuous ran- 
dom variable X having probability distribution px{s), the expected value 
of f{X) is 



Px{x) 



Now consider a uniform probability density 

l/{b-a), a<x<b, 
0, otherwise. 



If one uses this probability density to obtain TV outcomes Xi, X2, . . . , ^tv, 
and applies the function / to obtain random variables Yj — f{Xj), then 
the law of large numbers tell us that 

N ,h 



^ ^ ™ E{Y) = E{f{X)) = j^—^ I fis)ds. 



3= 



Define 



then 



1 ^ 



b 

{b-a)lun{f) = I f{s)ds. 



It is straightforward to generalize this result to multiple dimensions. Natu- 
rally, a key question is then: how good is such an estimate for finite A^? 

For large iV, the central limit theorem tells us that the error one makes in 
approximating E{X) by An is a/^/N = yJV{X)/N. For Y = f{X) as be- 
fore, the error in approximating E{f{X)) by Y., I{Xj)lN is ^V{f{X))/N. 
One can then use the Monte Carlo method to estimate the variance 
V{f{X)): 

V{Y) = E{{Y - E{Y)f) « ((/ - = (f) {f)\ 

If pxix) is not uniform but can be easily sampled, then one can use 
Px{x) to obtain N outcomes Xi, X2, . . . , X^, then apply the function / to 
obtain random variables Yj ~ f{Xj), and the law of large numbers tell us 
that 

1 j2 E{Y) = E{J{X)) ^ ['pxis) fis)ds. 

1=1 
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Fig. 5. A simple one-dimcnsional Monte Carlo integration. The integrand x{l — x) is 
shown on the left, while Monte Carlo estimates, with error bars, are shown on the right 
for several values of A^, the number of random points used. 



To summarize, simple Monte Carlo integration is accomplished using 



J^pix) fix) (/)±^ 



ip) if)' 



N 

N N 

i=l i=l 

where the N points xi, . . . , xn are chosen independently and randomly with 
probability distribution p{x) throughout the /^-dimensional volume V, and 
this density satisfies the normalization condition p{x)d^ x = 1. The law 
of large numbers justifies the correctness of this estimate, and the central 
limit theorem gives an estimate of the statistical uncertainty in the estimate. 
In the limit N ^ oo, the Monte Carlo estimate will tend to be gaussian 
distributed and the uncertainty tends to a standard deviation. 

Monte Carlo integration requires random numbers, but computers are 
deterministic. However, clever algorithms can produce sequences of num- 
bers which appear to be random; such numbers arc called pseudorandom. 
Devising a good random number generator is a science in itself, which will 
not be discussed here. Random number generators often utilize the modu- 
lus function, bit shifting, and shuffling to produce random 32-bit or 64-bit 
integers which can be converted to approximate uniform deviates between 
and 1. The Mersenne twister^ is an example of a very good random num- 
ber generator. It is very fast, passes all standard tests, such as the Diehard 
suite, and has an amazingly long period of 2^^^'^^ — 1. 
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To see the method in action, consider a simple one-dimensional example: 

1 

/ x{l — x) dx — — ^ 0.166666 • • • . 
Jo 

The integrand is plotted in Fig. 5. Various Monte Carlo estimates, including 
error bars, for different values of TV, the number of random points used, are 
also shown in this figure. One sees the error decreasing as N increases, but 
notice that the method is not particularly efficient in this simple case. 

The Monte Carlo method works best for fiat functions, and is most 
problematic when the integrand is sharply peaked or rapidly oscillates. 
Importance sampling can greatly improve the efficiency of Monte Carlo 
integration by dramatically reducing the variance in the estimates. Recall 
that a simple Monte Carlo integration is achieved by 

f f{x) dx^'^^-f^Y.fix,), 

j=l 

where the chosen with uniform probability between a and b. Suppose 

that one could find a function g{x) > with j'^ g{x)dx = 1 such that 
fix) 

h(x) = ——— is as close as possible to a constant. The integral can then be 
evaluated by 

f f{x)dx= I h{x)g{x)dx ^ H^j), 

where the Xj are now chosen with probability density g{x). Since the func- 
tion h{x) is fairly flat, the Monte Carlo method can do a much better job 
estimating the integral in this way. The function g{x) accomplishes the im- 
portance sampling, causing more points to be chosen near peaked regions. 
Of course, one must be able to sample with probability density g{x). Also, 
how can one find such a suitable function g{x), especially for complicated 
multi-dimensional integrals? 

Random number generators generally produce uniform deviates. To 
sample other probability densities, a transformation must be applied. Con- 
sider a random variable U with uniform density pu{u) ~ 1 for < x < 1, 
and another random variable Y = (f>{U), where </> is a strictly increasing 
function. A strictly increasing function ensures that the inverse function 
is single-valued, and also ensures that if u + du > u, then y + dy > y for 
y = (j^iu). The probability density py associated with the random variable 
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Y can be determined using the conservation of probability: 

, , , . , , , .du \\d(f>~^(y) 

PY{y)dy ^ pu(u)du, pY{y)=pu[u)—=pu{(p [y)) — — . 

Usually the desired density py is known, so the function must be deter- 
mined. For a uniform deviate pu{u) ~ 1, then du = pY{y)dy, and integrat- 
ing yields 

du' = PY{y)dy w = Fy(0(u)) ^ (l){u) ^ Fy^ (u). 

F^^ is unique since is a strictly increasing function. In summary: a ran- 
dom variable Y with probability density py (jj) and cumulative distribution 
Fyiy) ~ J^^py{s) ds can be sampled by first choosing U with imiform 
probability in some interval, then applying the transformation 

Y^Fy\U). 

This transformation method is only applicable for probability densities 
whose indefinite integral can be obtained and inverted. Thus, the method 
is useful for only a handful of density functions. One such example is the 
exponential distribution: 

pviy) = Y~~b' fo'^ 0<y <b. 

The cumulative distribution and its inverse are 

fy (1 - e~y) 

Fyiy) / PY{s)ds = 



(1 

Fy\u) = -Infl- (1 



Now consider the integral 



ds 



0.873109. 



l + s/9 

The integrand is shown in Fig. 6, and various Monte Carlo estimates with 
and without importance sampling of the integral are also shown in the 
figure. Estimates using importance sampling (triangles) are seen to have 
much smaller statistical uncertainties for a given value of N, the number of 
random points used. 

Probability densities whose cumulative distributions are not easily 
calculable and invertible can be sampled using the rejection method. 
This method exploits the fact that sampling from a density px(x) for 
a < X < b is equivalent to choosing a random point in two di- 
mensions with uniform probability in the area under the curve px{x). 
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Fig. 6. Plot of the integrand e~° /{I + s/9) on the left, and Monte Carlo estimates of 
the integral on the right. Circles show estimates without importance sampling, whereas 
triangles show estimates using importance sampling. Statistical uncertainties using im- 
portance sampling are dramatically smaller for a given value of N , the number of random 
points used. 



For example, one can pick a random point with 
uniform probability in a box a < x <h horizon- 
tally and < y < max(px(a;)) vertically; the 
result is accepted if it lies below the curve, but 
if above the curve, the result is rejected and the 
procedure repeated until an acceptance occurs. 

If px{x) is sharply peaked, then a more efficient implementation of the 
method uses a comparison function f[x) satisfying f(x) > px(x) for all 
a < X < b and which can be sampled by the transformation method. 




3.3. Monte Carlo using stationary stochastic processes 

The sampling methods described so far work well in one-dimension, but for 
multi-dimensional integrals, the transformation and rejection methods are 
usually not feasible. Fortunately, highly multi-dimensional integrals can be 
handled by exploiting stationary stochastic processes. 

A stochastic process is a sequence of events Xt, t = 0, 1, 2, . . . governed 
by probabilistic laws (we shall limit our attention to discrete "time" t). 
Consider a system which can be in one of R discrete states si, S2, . . . , s_r 
(generalization to a continuum of states is usually straightforward). The 
system moves or steps successively from one state to another. Given pre- 
vious states of the system Xq, Xi, . . . , Xt-i, the conditional probability to 
find the system in state Xt at time t is denoted by P{Xo, . . . , Xt-^i\Xt) and 
may depend on previous states of the system and possibly t. 
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A stochastic process is stationary when the probabiUstic laws re- 
main unchanged through shifts in time. In other words, the joint prob- 
abihty distribution of (Xt , Xt+j-^ , ■ • • , ^t+j^ ) is the same as that of 
(Xt+hi Xt-i-h+ji, ■ ■ ■ , Xt+h+j„) for any h. For such processes, the mean 
E{Xt) = n is independent of t (if it exists) and the variance i?((Xt — /i)^) = 
(T^ is independent of t if E{Xf) is finite. However, the Xt are usually not in- 
dependent random variables, but the autocovariance E((Xt— fi}{Xg^ ^)) = 
R{\t — s\) depends only on the time difference |i — s|. The autocorrelation 
function is defined by p{t) = R{t)/R{0) so that p(0) ~ 1 and — 1 < p{t) < 1 
for all t (from Schwartz's inequality). 

The Monte Carlo method described so far requires statistically inde- 
pendent random points. In order to use points generated by a stationary 
stochastic process, we must revisit the law of large numbers and the central 
limit theorem for the case of dependent random variables. 

The law of large numbers for stationary stochastic processes: 
Consider a stationary stochastic process Xi,X2, ■ ■ ■ with E{Xk) — fJ. and 
autocovariance i?(s) ^ E{{Xk— p){Xk+s — tJ-)) satisfying |i?(s)| < oo, 

and define Xn ^ X2-\ \-Xn)/N, then 

lim P{\Xn - Ai| > e) = 0, for any e > 0. 

Proof. Define Y„ = X^ - and Yn = {Yi + h Yn)/N, then 

1 ^ 1 
EiYl) = ^e[Y.Y,' + 2J2y,Y,) = —i^NRiO) + 2j2m~k)), 

k=l k<l k<l 

-^(0) ■ ' 'Z(N-k)Rik), 



so that 



N 

fc=i 



-2 



NE{Y-) = i?(0) + Y.k=i mk){N - k)/N 



<\Rm+i:k=im{k)\{N^k)iN, 



<\Rm+Y.tim{k)\ 



-2 



Since Y,j \R{:i)\ < oo. then NE{Yj^) < oo so \imN-.oo E{Y j^) = 0. The 
Chebyshev inequahty tells us that P{\Xn — /^| > e) < E{{Xn — p)^)/e'^ so 

lim E{(Xn - m)^) = implies lim P{\Xn - mI > e) = 0, 

which proves the weak law of large numbers for a stationary stochastic 
process with an absolutely summable autocovariance. □ 
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One can show that the Umiting vahie of the variance is 

oo 

lim NE{(Xn - fif) V R{k). 



k— — oo 



Proof. Since the autocovariance is absohitely sunimable J^k 1-^(^)1 
then for any e > there exists a q such that J^kLi 21-^(9+^)1 < £/2- Hence, 



N-l 



N-l 



N-l 



\Y,m)~NE{Y^) = RiO) + 2j2R{j)-[RiO)+Y.2R{k){N-k)/N 

3 = -{N-l) J = l 



k=l 



ELi^k\Rik)\ /N + Ek=,\imm\ IN 



<ELimm\ /N+Ek=,+im{k)\ 

<ELi2fc|i?(fc)| /N + e/2. 

Since q fixed and finite, we can always increase N so that 
X;Li 2fc|i?(A:)| /N < e/2 which holds as TV ^ oo. Thus, 



N-l 



j = -iN-l) 



—2 , 

N, 



< e, 



which proves the required result in the limit as iV oo. 



□ 



The M -dependent central limit theorem states the following: Let 
Xi, X2, . . . , Xn be a stationary M-dcpendent sequence of random variables 
{Xt and Xt+s are independent for s > M) such that E{Xt) = E{Xi) = 



and E{{Xi- ^Y) < 00, and define Xn = (Xi + X2 



a2 = E{{Xi-^if) + 2Y.ll^ E{{Xi-^i){Xh+i-fi)). Then for a < b, 



+ Xn)/N and 



aa , — , ba 
hm P —= < {Xn - < 



2tT J a 



In other words, the distribution of [X\ + • • • ^X^ — N ^) / {a\/ N) tends to 
a standard normal distribution (zero mean, unit variance). For the proof 
of this very important theorem, see Ref. 5 or Ref. 6. One version of the 
proof relies upon splitting the summation into blocks in such a way that 
the resulting variables are essentially independent. Note that 

M 

^ ^ R{h) = NE{{Xn - ^J-f ) for N > A/, 
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where the autocovariance R{h) = R{—h) = i?((Xf— as usual. 

Monte Carlo integration using a stationary stochastic process is sum- 
marized by the following formula: 

N N-h 
i=l 1=1 

where the N points xi, . . . , xpf are elements of a stationary stochas- 
tic process with stationary probability distribution p{x) throughout the 
Z?-dimensional volume V, which satisfies the normalization condition 
J^p{x)d^x = 1. One requires that the autocovariance is absolutely 
summable, that is, X)ft°=o ^ The law of large numbers justifies 

the correctness of the estimate, and the A/-dependent central limit theorem 
gives an estimate of the statistical uncertainty. 



3.4. Markov chains 

Markov chains are one of the simplest types 
of stochastic processes. Markov chains were in- 
troduced by the Russian mathematician Andrei 
Markov (1856-1922) in 1906. In this section, we 
will discuss Markov chains in great detail. 

A Markov chain is a stochastic process 
which generates a sequence of states with prob- 
abilities depending only on the current state of 
the system. Consider a system which can be in 
one of R states si, S2, ■ ■ ■ , sr (again, generaliza- 
tion to a continuum of states is usually straight- 
forward). The system moves or steps succes- 
sively from one state to another (we shall only 
consider discrete "time" Markov chains). If the current state is Si, then the 
chain moves to state Sj at the next step with probability pij which does 
not depend on any previous states of the chain. The probabilities pij are 
called the transition probabilities. The square R x R real-valued matrix P 
whose elements are pij is called the transition matrix or the Markov matrix. 
Furthermore, we shall only deal with time homogeneous chains in which the 
transition probabilities pij are independent of "time" or their position in 
chain. 
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Let us start with some basic properties of Markov ehains. 

• The transition matrix P has only non-negative entries pij > 0. 

• Since the probabihty of going from to any state must be unity, 
then matrix elements must satisfy X^jLi Pij ~ 1 (rows sum to 
unity). 

• If the columns also sum to unity, then P is called a doubly stochastic 
matrix. 

• If Pi and P2 are Markov matrices, then the matrix product P1P2 
is also a Markov matrix. 

• Every eigenvalue A of a Markov matrix satisfies |A| < 1. 

• Every Markov matrix has at least one eigenvalue equal to unity. 

It may be helpful at this point to review the properties of the eigenvalues 
and eigenvectors of real square matrices. 

• For a square matrix P, a nonzero column vector v which satisfies 
Pv = Av for complex scalar A is known as a right eigenvector cor- 
responding to eigenvalue A. Often, "right eigenvectors" are simply 
called "eigenvectors" . 

• A nonzero vector v satisfying v^P = Av^, where T indicates trans- 
pose, is known as a left eigenvector. 

• Every square R x R matrix has R complex eigenvalues, counting 
multiple roots according to their multiplicity. 

• For a real square matrix, the eigenvalues are cither real or come in 
complex conjugate pairs. 

• Eigenvectors for distinct eigenvalues arc linearly independent. 

• A degenerate eigenvalue may not have distinct eigenvectors. 

• R linearly independent eigenvectors are guaranteed only if all R 
eigenvalues are distinct. 

• A matrix P and its transpose P"^ have the same eigenvalues. 

Now let us look at the last two properties of Markov matrices above in more 
detail. 

• Every eigenvalue A of Markov matrix P satisfies |A| < 1. 

Proof. Suppose that a complex number A is an eigenvalue of P with cor- 
responding eigenvector v so that Pv = Av. Let k be such that \vk\ > \vj\ 
for all j, then the fc-th component of the eigenvalue equation gives us 
'^jPkjVj = Xvk- Use the generalized triangle inequality for complex num- 
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bers \Y.k^k\ < Y.k kfcl to show 

|Awfc| = \J2jPkjVj \ < < "EjPkjHl = \vk\ 

Thus, |Awfc| = < \vk \ implymg |A| < 1, which completes the proof. □ 

• Every Markov matrix P has at least one eigenvalue equal to unity. 

Proof. Let v be a vector satisfying vj ~ 1 for all j, then '}2jPij'"j ^ 
TlijPij = 1 = w;. Hence, v is an eigenvector corresponding to eigenvalue 1, 
so every Markov matrix has at least one eigenvalue equal to unity. □ 

Multi-step probabilities are determined from powers of the Markov ma- 
trix. The ij-ih element p^"-* of the matrix P" is the probability that a 
Markov chain, starting in state Si, will be in state Sj after n steps; this is 
usually called the n-step transition probability. For example, the probabil- 
ity to go from Sj to Sj in 2 steps is 'Y^^=iPikPkj- For a starting probabil- 
ity vector u, the probability that the chain is in state Sj after n steps is 
Uj"^ = X^ili "^iPij^ ■ Note that Ui is the probability that the starting state is 
Si. The previous expression can be written in matrix form by u*^")-^ = u-^P". 

Another important concept is the first visit probability, which is the 
probability that a Markov chain, starting in state Si, is found for the first 
time in state Sj after n steps. This first visit probability is here denoted 
by fij'^- We define f^j^ = 0, and for one step, f^p = py. For two steps, 
fif ^ ^k^j PikPkj which generalizes to n-steps as 

fij ^ — ^Pifc fkj 

An important relation for later use is 

n 

(n) ^ Am) (n-m) 

m— 1 

The total visit probability fij is the probability that, starting from state 
Si , the chain will ever visit state Sj : 

oo 
n=l 

The mean first passage time rriij from st to sj is the expected number of 
steps to reach state Sj in a Markov chain for the first time, starting from 
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state Si (by convention, ma = 0): 

oo 

E„(ti) 
" A" • 

Tl=l 

The mean recurrence time fii of state Si is the expected number of steps to 
return to state Si for the first time in a Markov chain starting from s^: 

oo 
n=l 

Classes are an important concept in studying Markov chains. State Sj 

(n) 

is called accessible from state Si if p^^ > for some finite n. This is often 
denoted by Si — s- Sj. Note that if Si Sj and Sj s^, then Si Sk- States 
Sj and Sj are said to communicate if Si — > Sj and ^ s;; this is denoted 
by Si Sj. Note that Si <-!■ Sj and Sj <-> s^ implies s^ ^ s/j. A c/ass is 
a set of states that all communicate with one another. If Ci and C2 are 
communicating classes, then either Ci ~ C2 or Ci and C2 are disjoint. To 
see this, start by noting that if Ci and C2 have a common state s^, then 
Sj Sji for all Sji G Ci and Sj Sj2 for all Sj2 G C2, so s^i Sj2, 
implying Ci =6*2. This means that the set of all states can be partitioned 
into separate non-intersecting classes. Also, if a transition from class Ci to 
a different class C2 is possible, then a transition from C2 to Ci must not 
be possible, since this would imply Ci ~ €2- 

A Markov chain is called irreducible if the probability to go from every 
state to every state (not necessarily in one step) is greater than zero. All 
states in an irreducible chain are in one single communicating class. 

States in a Markov chain can be classified according to whether they 
are (a) positive recurrent (persistent), (b) null recurrent, or (c) transient. 
A recurrent or persistent state has fa = Xi^i /fi"^ = 1' thai is, there is 
unit probability of returning to the state after a finite length of time in 
the chain. A transient state has fa = Y^'^=i fii^ < 1. A recurrent state is 
positive if its mean recurrence time is finite fii < 00; otherwise, it is called 
null. 

In addition, states in a Markov chain can be classified according to 
whether they are periodic (cyclic) or aperiodic. The period of a state in 
a Markov chain is the greatest common divisor of all n > for which 

(n) 

Plj^ > 0. In other words, the transition Sj to Sj is not possible except for 
time intervals which are multiples of the period d{i). A periodic state Sj has 
period d{i) > 1, whereas an aperiodic state Sj has period d{i) = 1. 
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For a recurrent state, Yl^=iPii/' ~ whereas for a transient state, 

Eoo In) ^ 



Proof. Start with the following: 

N N n N N-m N N 

n—l 71—1 m—1 m—1 n—0 ra—1 n—0 

since pj"'' > 0, but for N > N', we also have 

N N N-m N' N-m N' N-N' 

Y^ = E /fi"^ E ^ > E 4™^ E ^1? ^ E 4™^ E 

n—l m—1 n—0 m—1 n—0 m—1 n—0 

again since the p^""* > and fl™'^ > 0. Putting together the above results 
yields 

AT' N-N' N N N 

m—1 n—Q n—l m—1 ?i — 

Take N —> oc first, then A^' ^ oo to get 



f.Yp^'<Yp^'<hYp^i' ^ f.Y^ = Yp^ 



(n) 
ij ■ 

n=l 



Set i = j and use pf^ = 1 to see that + ^^=1^1"^) = Yl°^=iPu \ so 

(n) fa 



1 - h 

n=l 

Now use the facts that fa = 1 for a recurrent state and fa < 1 for a 
transient state to complete the proof of the above statements. □ 



Note that the above results also imply 

oo 

Ypv' = 



(n) _ /i 

n=l 



A Markov chain returns to a recurrent state infinitely often and returns 
to a transient state only a finite number of times. 

Proof. Let g.y (to) denote the probability that a Markov chain enters state 
Sj at least to times, starting from s^. Clearly (7^ (1) — fij- One also sees that 
gij{m + 1) = fijgjjim), so 5y (m) = (/ij)™. The probability of entering Sj 
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infinitely many times is gtj = lim„i_,oo = liiiim-too(/jj )™, so starting 

in Sj, then 

, „ f 1, for recurrent state since f,., = 1, 

g-jj = Imi (/„■) = < ' 

m^oo 1^ 0, for transient state smce jjj < 1, 

wfiicli completes the proof. □ 

Another important result for recurrent states is as follows: if Si is recur- 
rent and Si Sj, then fji ~ 1. 

Proof. Let a denote the probability to reach Sj from Si without previously 
returning to s,;, and since Si — > Sj we know that a > 0. The probability 
of never returning to Si from sj is 1 — fji, and the probability of never 
returning to Si from Si is at least a{l — fji). But Si is recurrent so the 
probability of no return is zero; thus, fji = 1. For two communicating 
states Si Sj that are each recurrent, it follows that fij = fji = 1. □ 

All states in a class of a Markov chain are of the same type, and if 
periodic, all have the same period. 

Proof. For any two states Si and Sj in a class, there exists integers r and 

(r) (s) 

s such that p — a > and Pj/ = /3 > so 



{n+r+s) (r) (n) (s) . V"^ (r) (n) (s) . (r) (n) (s) r, (n) 

=2^ Pik Pkl Ph > 2^ Pik Pkk Pki > Pi, P)j Py = (^Pp)j ■ 
kl k 



If Si is transient, then the left-hand side is a term of a convergent series 
SfcPif^ < oo, so the same must be true for p'.jf , and if — > 0, then 
Pj'j — > 0. The same statements remain true if the roles of i and j are 
reversed, so either both Si and Sj are transient, or neither is. If Sj is null 
(infinite mean recurrence time /ij = X^^i fjj^ ~ then s; must be 
null as well. The same statements are true if i, j are reversed, so if one is a 
null state, then so is the other. Similarly, we can also conclude that if either 
Si or Sj is positive recurrent (finite mean recurrence time), then so is the 
other. Thus, we have shown that all states in a class are of the same type 
(positive recurrent, null recurrent, or transient). 

Suppose Si has period t, then for n = 0, the right-hand side of the 
above equation is positive, so vfi^'^^ > 0, which means that r + s must be 
a multiple of t. Hence, the left-hand side vanishes unless n is multiple of t, 
so pj"' can be nonzero only if n is multiple of t, which means that Si and 
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Sj have the same period. Note that the chain is aperiodic if pu > for at 
least one s^. □ 

States in an irreducible chain with period d can be partitioned into d 
mutually exclusive subsets Gq, ■ ■ ■ , Gd-i such that if state Sfc S Ga, then 
Pik ~ unless n = a + vd. 

Proof. Since irreducible, all states have the same period d and every state 
can be reached from every other state. There exist for every state two 
integers a and h such that p^^^ > and p]^^ > 0, but p^"^''^ = J2j Pij'pfi — 
PikPl^i > 0, SO a + divisible by d. Thus, a + b = md for integer m, or 
a = — 6 + md. Rewrite this &s a = a + vd for integer v and < a < d. The 
parameter a is characteristic of state Sk so all states are partitioned into d 
mutually exclusive subsets Gq, Gi, • • • , Gd-i- □ 

With proper ordering of the Ga subsets, a one-step transition from a 
state in Ga always leads to a state in Ga+i^ or from G^-i to Go - Each subset 
Ga can be considered states in an aperiodic Markov chain with transition 
matrix P'^. 

As an aside, consider the following fact concerning finite Markov chains: 
in an irreducible chain having a finite number R of states, there are no null 
states and it is impossible that all states are transient. 

Proof. All rows of the matrix P" must add to unity. Since each row con- 

(n) 

tains a finite number of non-negative elements, it is impossible that p\j 
for all i,j pairs. Thus, it is impossible that all states are transient, so at 
least one state must be non-null. But since the chain is irreducible (one 
class), all states must be non-null. □ 

In fact, in an i?-state irreducible Markov chain, it is possible to go from 
any state to any other state in at most R — 1 steps. 

We now need to consider a very important theorem (often referred to 
as the basic limit theorem of the renewal equation) about two sequences. 
Given a sequence /o, /i, /2, ■ • ■ such that 

oo 

/0 = 0, 0</„<l, $^/n-l, 

n=0 

and greatest common divisor of those n for which /« > is d > 1, and 
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another sequence mq, ui, U2, . . . defined by 

n 

Uo = l, Un= ^ fmUn-m, {n > 1), 

m— 1 

then 



hm Und 

n — >oo 



dfi ^ if /i = nfn < oo, 

71=1 

if /I = oo. 



Proof. Sec Rcfs. 7,8 for a complete proof. Here, we shall only provide a 
sketch of the proof of this theorem. First, note some key properties of these 
sequences. We know that < /„ < 1 for all n since /« > and X^^o fn = ^■ 
Also, < w„ < 1 for all n can be established inductively. To do this, first 
note that uq = 1, ui = /i, U2 = /2 + /i satisfy the above bounds. Assume 
< Ufe < 1 for all < A: < n. Since /m > and J2m=i /m ~ ^- then 
Un+i = X^m^i fmUn+i-m > sincc it is a sum of nonnegative terms, and 
Un+i = Z]m=i fmUn+i-m < J2"n=i fm which complctcs the induction. 

Next, limit our attention to d = 1 (the nonpcriodic case). Since m„ 
is a bounded sequence, A = limsupjj^o^Un is finite, and there exists a 
subsequence ni < n2 < ■ ■ ■ tending to infinity such that lim-,-^co = A. 
The next step in the proof is to show that limj_»oo Unj-q = A for any integer 
q>0 when /i > (we skip this here). 

Now define a new sequence r„ = Ylk>n fk- Some important properties 
of this sequence are r„ > for all n, ?'o = 1, r„__i — r„ = /„ for n > 1, and 
= = M- One very crucial identity is 

N 

TkUM-k = 1, for all > 0. 

fe=0 

To see this, define Ap^ ~ 12'k=o^k'U'N-k- Start with 

N N 

WAT = ^ fmUN-m = ^ (fm-1 " ''m)uAr-m, 
m— 1 m— 1 

use To = 1; and rearrange to get 

N N 
m— 1 7n— 1 

Take m — > fc + 1 on the right: 



N N-1 

E 

m=0 fe=0 



rmUN-m = ^ rkUN-l-k, 



40 



which shows — Aw-i for aU N. Thus, An — ^at-i — An^2 — ■ ■ ■ = 
Ao = rouo = 1. 

RecaU that ni < n2 < ■ ■ ■ is a subsequence such that hnij^oo Un-^q = A 
for any integer g > 0. Since X]fc=o ^fc"nj-fe ^ 1 f^i' ^^^"^ t"*; > 0, Ufe > 

for all fc, then X^feLo ^kUnj-k < 1 for fixed N < Uj. Take the limit j oo 
so 

hm V'rfeUn^-fe = A V'rfe < 1. 

J— >oo ' 

fe=0 fc=0 

We already know that A > 0, so take TV — > oo to have 



0< A< l/(^rfc). 



fe=0 



If X^feLo ^fc = then lim„^oo itn = A = 0. If /i = J2kLo finite, N —f oo 
gives /iA < 1. Define M = sup,j>Q u„ so < Ufc < il/ < 1 for all k, and define 
= J^'kLo+i ^'1^ - noting that g{D) > for all D and limc^oc ff(-D) = 0. 
Consider 

D rij 

'^rkUn^-k+ ^ rfcM„^.„fc = 1, forD<nj. 

fe=0 A;=D + 1 



Thus 



D 



^ rfcu„^_fe + Mg{D) > 1 for < 



Take j oo to conclude 



Take the limit ^ oo to obtain A/x > 1. We have now shown 1 < ^A < 1 
so ^A = 1. The proof for the nonperiodic (d = 1) case is now complete. 

When d > 1, we know /,„ = unless m = nd for integer n. One can 
then show Um = unless m = nd. Define new sequences = fnd and 
u'n = ""nd for n = 0,1,2, ... . Since the new sequence is aperiodic, we know 
lim„^ooMji = 1//^' where /i' = J2n=o'^fn- Since /m = when m ^ nd, 
then 

oo oo 
n— m— 

Thus, linin^oo Und = df.i~^ as required. □ 
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An important feature of a Markov chain is the asymptotic behavior of 
its ri-step probabihties as n becomes large. First, consider the n-step 
probabihty to go from state Sj back to Sj as n becomes large. This behavior 
can be summarized as 

Sj transient or null recurrent, 
lim p^'^"' = I fi~^, Sj aperiodic positive recurrent, 

Sj positive recurrent with period d. 

Proof. If Sj is transient, then X^nPjj' finite (converges), requiring 
pj"' — ^ 0. For a recurrent Sj, let /„ = /j"' and w„ ~ P'jf ■ The sequences 
fn,Un so defined satisfy the conditions of the basic limit theorem of the 
renewal equation previously discussed, which tells us that P^^"^ '^^^J^ 
where ^j = "/j^' mean recurrence time. Of course, the ape- 

riodic case applies when d = 1. If Sj is null recurrent, then /i^ = oo so 



(n) 

Next, the asymptotic behavior of p-j can be summarized as 

J. („) ( 0, Sj transient or null recurrent, 

n^oo^*-' fijf^J^j Sj aperiodic positive recurrent. 



Here, we will ignore the periodic case. 
Proof. Start by noting that 

n n n 

m=l m=l m=n' + l 

Since < E:.„.+i //TM"""^ ^ then 

Take ri — s- cx), then n' — > oo above, and denote pjj = lim„^oo Pj"^ to deduce 
< Mim„^oo Py"' - Pjjfij) < => lim„_^ooP-"^ = Pjj fij- 



For the case of Sj transient or null recurrent, pjj = and finite, so 

lim„^oo Pi"' = 0. For Sj aperiod and positive recurrent, pjj = fij^ so 
(") J- -1 „ 



The above information will be needed in proving a very important prop- 
erty of irreducible aperiodic Markov chains. Before getting to this property. 
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it is convenient to introduce a few definitions and remind the reader of two 
lemmas concerning sequences. 

A probability vector w is called stationary or invariant or a fixed-point if 
w-^ = w^P. Clearly, one also has w-^ = w-^P". If one starts a Markov chain 
with an initial probability vector that is stationary, then the probability 
vector is always the same (stationary) for the chain. When this occurs, the 
Markov chain is said to be in equilibrium. 

Fatou's lemma and the dominated convergence theorem will be needed 
in demonstrating an important property of Markov chains, so these theo- 
rems are briefly recapped next. 

Fatou's lemma: Let a„(i) for n =1,2,... be a function on a discrete 
set T ~ {1, 2, . . . }, assume lim„_»oo exists for each t in T , and suppose 
a„(t) > for all t, n, then 



( lim a„{t)) < lim y^a„(i). 

\n— >-oo / n — >oo ' ^ 



E 



Proof. For any integer M 

M M 



( lim an{t)] = lim y'a„(t) < lim y'a„(t), 

^ — ^ \n — >OQ / n — >oo ^ — ^ n — 'OO ^ — ^ 

t=l t=l t=l 

since all a„(t) > 0. Take the limit M ^ oo to obtain required result. □ 



This lemma shows that taking the limit then summing the sequence is 
not the same as summing the sequence, then taking the limit. For exam- 
ple, consider a„(t) = n/in? + t^). For n > t then lim„_+oo a„ (t) = so 
Stli (hm„^oo anit)) = 0. But 

E ^ coth(?i7r) - -!- so lim ^ an(i) = ^- 
t=i t=i 

The dominated convergence theorem: Let a„(t) for n = 1, 2, . . . be 
a function on a discrete set r={l,2,...}, assume lim„^co o,n(t) exists for 
each t in T, and suppose a function B{t) exists such that |a„(t)| < B{t) for 
all t, n and J2teT then 

( lim a„(t)) = lim y'a„(i). 

^ — ^ \n — >QO / n — >QO ^ — ^ 

teT teT 

Proof. Let a{t) = lim a„(i) and given \a{t)\ < B{t), then J2'u=i'^i^) "^o^" 
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verges since X^t^i converges. For any integer A/, 

oo oo A/ oo 

|^a„(t)-^a(t)| <^|a„(i)-aWI+ E + \a{t)\'^ 

t=l t=l t=l t=M+l 

Next, take the limit n oo, and for a finite sum of positive terms, the 
summation and limit can be taken in any order: 

M AI 

lim V |a„(i)-a(t)| = V( lim |a„(i)-aW|) = 0. 

n — >OD ^ — ^ ^ — ^ \n — 'OO / 

t=l t=l 

We also have 

£ (|a„(t)| + |aW|) <2 £ B{t), 

t=M+l t=M+l 

SO for any integer M , 



lim Va„(t)- V lim a„(i) <2 V 

I. — 'OO ' ^ ' ^ n. — 'OO ' ^ 

t=M+l 



n — 'OO ■* — ' ^ — ' n — 'OO 

t=l t=l 



The right-hand side is the remainder of a convergent series, which means 
that it must equal zero in the M ^ oo limit. The equality to be shown 
easily follows. □ 

The dominated convergence theorem essentially specifies the conditions 
under which the order of taking an asymptotic limit and summing a se- 
quence does not matter. 

And now, without further adieu, we state the very important funda- 
mental limit theorem for irreducible Markov chains: An irreducible 
aperiodic Markov chain with transition matrix P has a stationary distri- 
bution w satisfying Wj > 0, J^j ''-^'j ~ 1; ^^cl w-^ = w-^P if, and only if, all 
its states are positive recurrent, and this stationary distribution is unique 
and identical to the limiting distribution Wj = lim„^oo Pij independent of 
initial state Si. 

Proof. For an irreducible aperiodic chain, the following possibilities exist: 

(a) all states are positive recurrent (an ergodic chain), 

(b) all states are null recurrent, 

(c) all states are transient. 

(n) 

If all states are transient or null recurrent, then lim„_tooPj-j = 0. If all 
states are positive recurrent, then since all states communicate, = 1 
for all i,j and the basic limit theorem of the renewal equation tells us 
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that lim„_>oo Pi""* = fJ-J^- Let us define Wj ~ lim„_too Pi"'' — which is 
independent of the initial state Si. For all states positive recurrent, then 
< < oo so Wj > for all j. We have p,-™^"^ = J2T=i Pik^Pk^^ using 
Fatou's lemma: 



oo 



(m+n) \ ^ (n) (m) ^ \ ^ ,. (n) (m) 

hm p = hm 2^plk Pkj ' > 2^ lim P ^'p^j '■ 

fc=l A:=l 

Taking the limit n ^ oo yields Wj > J2T=i ''^^ p'kf' ■ E)cfine s = YlV=i '^k, 
then sum the above equation over j: 

oo oo oo oo oo oo 

j = l j = l k=l k=l j=l k=l 

where we used the fact that the rows of the Markov matrix and its powers 
sum to unity. Interchanging the order of the two infinite summations above 
is possible since all summands are non-negative (Fubini's theorem). We 
have shown that s > s, which means that the equality must hold: 

oo oo oo 

j=l J=l k=l 

But for each term, we have already shown that wj > X^fcLi ^k P^kj'^- Since 
each one of the terms in the summation is known to be greater than or 
equal to zero, we must conclude that the equality holds term by term for 
every j: 

oo 

E(m) 
Wk Pi/. 

k=l 

For TO = 1, we see that the limiting vector w satisfies the criteria for a 
stationary vector. Next, use '^JLiPij^ ~ 1 ^-i^d Fatou's lemma to show 
that 



oo 



1 = lim p,["'' > lim pi"-* = "S^ w 

n — »oo ^ — ^ ■' ^ — ^ n — >oo ^ — ^ 

Given "^jWj < 1, then consider the limit m — > oo of 



Uj ^ lim Wk Pi 

fe=l 



(m) 
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Since < p^™"* < 1, then < Wk and J2kLi Wk < oo so the domi- 

nated convergence theorem can be apphed: 



= hm Wk Pkf 



^ Wk hm p^™' 



fe=i 



fc=i 



k=l 



We can at last conclude that 



1. 



Only the uniqueness of the stationary state is left to show. If another 



stationary vector v existed, it would have to satisfy Vj > 0, 



1, 



and Vj = 



(n) 



Conditions for the dominated convergence theorem 



again apply, so taking the n — *■ oo limit gives 



Vj = lim y 



(n) 



EVi lim 
n — >oo 

1=1 



oo 



Since v = w, then w is unique. 



□ 



A simple example may help to understand the above result. Consider 
the following transition matrix 

"3 10 
2 1 

111 

4 4 2 

Since has all positive entries (greater than zero), this Markov chain is 
irreducible. The eigenvalues of P are 1, h, j^, and the unnormalized right 



and left eigenvectors are 



right: 



1 


1 


5 




1 


1 


5 


2 


12 




2 


12 


"1" 




2" 




3" 


left: 


2 




-1 




-3 


1 




-2 




-4 




3 









-1 


1 




1 




3 




2 




1 




4 



The left fixed-point probability vector and lim„- 



are 



w = — 
7 



hm P" 

n — *-oo 



2 3 2 
2 3 2 
2 3 2 



A positive recurrent chain guarantees the existence of at least one in- 
variant probability vector. Irreducibility guarantees the uniqueness of the 
invariant probability vector. Aperiodicity guarantees that the limit distri- 
bution coincides with the invariant distribution. 
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Suppose a Markov chain is started with a probabihty vector given by 
w, the left fixed-point vector of the transition matrix P. This means that 
the probabihty of starting in state Si is Wi. Then the probabihty of being in 
state Sj after n steps is (w-^P")j, but w-^P" — w'^ , so this probabihty is 
Wj. Thus, the probabihty vector is always the same, that is, it is stationary 
or invariant. When this occurs, the Markov chain is said to be in equi- 
librium. Recall that an ergodic (aperiodic, irreducible, positive recurrent) 
Markov chain which starts in any probability vector y eventually tends 
to equilibrium. The process of bringing the chain into equilibrium from a 
random starting probability vector in known as thermalization. 

An ergodic Markov chain is reversible if the probability of going from 
state Si to Sj is the same as that for going from state Sj to Si once the chain 
is in equilibrium. Since the probability that a transition from Si to Sj occurs 
is the probability Wi of finding the chain in state Si in equilibrium times 
the transition probability pij , then reversibility occurs when WiPij ~ WjPji . 
The above condition is often referred to as detailed balance. Note that 
detailed balance guarantees the fixed-point condition: since pij = 1 then 

Y^WjPji ^Y^WiPij = Wi. 

j j 

Since an irreducible aperiodic Markov chain with positive recurrent 
states in equilibrium is a stationary stochastic process, we can simply adapt 
the Monte Carlo integration formulas for stationary stochastic processes. 
Hence, the Monte Carlo method of integration using a Markov chain in 
equilibrium is specified by 

N N~h 

if) ^ Rhif) ^ ]^ E (/(^o-(/>)(/(^.+o-(/>), 

1=1 j=i 

where the N points xi,...,xn in the Z?-dimensional volume V are ele- 
ments of an irreducible aperiodic Markov chain with positive recurrent 
states and stationary (and limiting) probability distribution p{x) through- 
out D-dimensional volume V. Note that the Markov chain must be in equi- 
librium, and as usual, the stationary probability distribution must satisfy 
the normalization condition p{x)d^ x = 1. The autocovariance must be 
absolutely summable X^ft^o l^hiDl < oo- 

Once again, let us pause for some reflection. We have seen that multi- 
dimensional integrals can be estimated using the Monte Carlo method. 
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but importance sampling is often crucial for obtaining estimates with suffi- 
ciently small statistical uncertainty, especially when the integrand is peaked 
in one or more regions. The rejection method can be used in one or few 
dimensions, but is difficult or impossible to apply when the dimensional- 
ity of the integration becomes large. In such cases, the use of a stationary 
stochastic process is often our only option. A particularly useful type of 
stationary stochastic process is an ergodic (positive-recurrent, aperiodic, 
and irreducible) Markov chain in equilibrium. The amazing fundamental 
limit theorem for ergodic Markov chains tells us that such a Markov chain 
has a unique stationary distribution which is also the limiting distribution. 
Hence, we can start the chain with any initial probability vector and are 
guaranteed that the probability vector will eventually evolve into the re- 
quired stationary vector. The uniqueness of the stationary vector and the 
coincidence of the stationary vector with the limiting vector make ergodic 
Markov chains especially useful for Monte Carlo applications. 

Points generated by a Markov process depend on previous elements in 
the chain; as stated earlier, this dependence known as autocorrelation. This 
autocorrelation depends on the observable (integrand) being estimated. For 
any observable (integrand) O;, the autocorrelation p(r) is defined by 

{0^0^+r) - 

m-m' ■ 

Highly correlated points yield an autocorrelation value near unity; inde- 
pendent points produce a value near zero. Decreasing autocorrelations de- 
creases the Monte Carlo error, as can be seen from the error formula above. 
Usually the dependence decreases as the number of steps between elements 
in the chain increases, so a simple way to decrease autocorrelations is to 
not use every element in the chain for "measurements" , and instead skip 
some number of elements between measurements. 

3.5. The Metropolis-Hastings method 

We generally know the probability density 7r((/)) that we need to sample to 
evaluate the integral J 0(^)7r((/))I?(/), where </> represents a vector of integra- 
tion variables and the observable 0{(j)) is some function of the 0. For our 
path integrals, we need to generate paths with a probability distribution 

g-5[0]/ri 

where S(<j)) is usually a real- valued action. In the imaginary time formalism, 
this path integral weight is real and positive, allowing a probability inter- 
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pretation to facilitate importance sampling in the Monte Carlo method. 
In order to sample the probability density 7r(0), we need to construct a 
Markov chain whose limiting stationary distribution is 7r((^). But how do 
we construct the Markov transition matrix P{<j> 0)? 

There are several answers to this question, but we shall focus only on the 
simplest answer here: the Metropolis-Hastings method. This method 
is very simple and very general. It also has the advantage that the prob- 
ability normalization never enters into the calculation. Its disadvantage is 
the presence of strong autocorrelations since only updates which change 
the action by a small amount are allowed. 

To describe this method, let us first change to a quantum mechanical 
notation of putting earlier states on the right, later states on the left. The 
Metropolis-Hastings algorithm uses an auxiliary proposal density R{(j) ^ 0) 
which 

• must be normalized. 

• can be evaluated for all (f), (j), 

• can be easily sampled, 

• and needs no relationship to the fixed-point probability density n((f>). 
Given this proposal density, the Metropolis-Hastings method updates the 
Markov chain ^ as follows: 

(1) Use R{(j) <— (f)) to propose a new value cj) from the current value 0. 

(2) Accept the new value with probability 

(3) If rejected, the original value (j) is retained. 

A rule of thumb is to tweak any parameters in the proposal density to obtain 
about a 50%— 60% acceptance rate. A higher acceptance rate might indicate 
that the proposal density is exploring the integration volume too slowly, 
whereas a lower acceptance rate might indicate that too much computer 
time is being wasted attempting updates that get rejected. If the proposal 
density satisfies reversibility R{4) <—</)) = R{(j) <— 0), then the acceptance 
probability reduces to min(l, 7r((/))/7r ((/))), which is known as the Metropolis 
method. 

The Metropolis-Hastings method produces a Markov chain which satis- 
fies detailed balance. 
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Proof. The (normalized) transition probability density is 

+ 5{^~(f>) (l - fv^ Pacc(0 ^ (t>)R& ^ 0) 



Define 

A(0 ^ 0) = Pacc(^ ^ ^ 0)^(0), 



= min (i?(</) ^ 0)^(0), i?(0 ^ ,/))^(,/))) , 



where the last line follows from P(0 ^ (/))7r(0) > 0. Note that this quantity 
is symmetric: A{(f) ^ 0) = A(0 ^ (f>). So we have 



+ 5(0 - 0) (^1 - y 2?0 Pacc(0 ^ 0)P(0 ^ 0) j ^(0), 

= ^(0 ^ 0) + - 0) (^(0) - y P0 ^(0 ^ 0)) , 
=^(0.-0)+ 5(0-0) if(0), 

A'(0)==^(0)- /p0^(0^0). 



where 



Given the symmetry of A and the Dirac 5-function, then detailed balance 
holds: 

W{4> ^ 0)7r(0) = W{(j) ^ 0)7r(0), 
as was to be shown. □ 

Does this really work? Consider a one dimensional example to answer 
this. Let g{x) = cos(vl+^) and h{x) ~ e^^' /{x^ + 2). Notice that g{x) 
changes sign, but h{x) > so h{x) is suitable for importance sampling. 
Consider evaluating the ratio of integrals 

r oix^hix^dx 
^ = T°o \ / = 0.3987452 . . . , 

using a Markov-chain Monte Carlo method with importance sampling den- 
sity tt{x) = Z^^h{x), where Z = h[x)dx. A simple Metropolis imple- 
mentation would be as follows: choose a value 5 with uniform probability 
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Fig. 7. Monte Carlo estimates of the ratio of integrals g{x)h{x)dx / h{x)dx, 

where g{x) = cos(\/l + x'') and h{x) = jix? + 2), against the number of Markov 
chain elements used. A simple Metropolis method was used with as the sampling 
probability density. The horizontal line indicates the exact answer. 

in the range — A < (5 < A, propose x = x + (5 as the next element in the 
chain, then accept with probabihty min(l,7r(a;)/7r(x)) = min(l, /i(a;)//i(a;)). 
Some Metropolis estimates for various values of A^, the number of random 
points used, arc shown in Fig. 7. A value of A = 1.5 was found to yield an 
acceptance rate near 50%. Note that we never needed to evaluate Z. The 
horizontal line is the exact answer. One sees that the method really does 
work. 

4. Monte Carlo study of the simple harmonic oscillator 

As a first simple example, let us apply the Monte Carlo method to evaluate 
path integrals in the onc-dimensional simple harmonic oscillator. The action 
in the imaginary time formalism is given by 



To carry out a Monte Carlo evaluation, it is necessary to discrctize time 
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where e should be chosen so discretization errors are sufficiently small. 
Introduce the dimcnsionlcss parameters 



so that the action can be written 



1 2 2 
AC = -£ W , 



h 2 

and with a few more manipulations, the action becomes 



f = ^(l + «)(rfo + c«^) + (! + «) 



The first constant is irrelevant, so it can be discarded, then one last rescaling 

1 - AC 











-(l-At) 


E ^j'^j+i 









djVT+Ac, g = , do = d-N = 0, 



1 + Ac' 



yields the final form for the action: 



S 

h 



N-1 




N-1 


E"l 


-9 


E 






_3=0 



In this form, we have set uq = ujy = 0, which is tantamount to requiring 
Xa = Xb = 0- The observables we will compute will be independent of the 
choice of the initial Xa and final x^ locations of the particle. A given path is 
specified by a vector u whose A^— 1 components are Uj for j ~ 1,2, . . . , N—1. 

We must now devise an auxiliary proposal density in order to produce 
a Markov chain using the Metropolis-Hastings method. There is consider- 
able freedom in designing such a proposal. If we use an auxiliary proposal 
that simultaneously changes all iV — 1 components of u, one finds that the 
resulting changes to the action are rather large, and the acceptance prob- 
ability becomes nearly zero. In order to get a reasonable acceptance rate, 
we must make only small changes to the action. This can be accomplished 
most easily if we only change one of the Uj at a time. The most natural 
way to proceed is to randomly pick one time slice and perform a local up- 
date of that time slice. If equal probabilities are assigned to each time slice, 
then detailed balance is maintained and covering the entire hypervolume of 
integration is ensured. 

A simple procedure for updating the path u ynew follows: 
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(1) Randomly choose an integer j from 1 to Nt — 1, where Nt is the number 
of time shoes, with equal probability. 

(2) Propose a random shift Uj — > Uj = Uj + S with S chosen with uniform 
probability density in the range —A < 6 < A. 

(3) Calculate the change SS in the action: 

SS/h = 6 (S + 2uj — g{uj^i + Wj+i)) . 

(4) Since R{uj ^ uj) = R{uj ^ uj), then accept the proposed value 
u^'^'" = Uj with probability min(l, e^'''^/''). If not accepted, then retain 
the old value: m"®^ = uj. 

(5) Repeat the above procedure Nt times to constitute one updating sweep. 

The rule of thumb for setting the value of A is to achieve an acceptance 
rate around 50%. A lower rate means that too much time is being wasted 
with rejections, whereas a higher rate means that the Markov chain might 
be moving through the integration hypervolume too slowly. 

To start the Markov chain, one can either choose a random path (hot 
start) or choose Uj = for all j (cold start). One then updates A^thGrm 
number of sweeps until the fixed point of the chain is reached (thermal- 
ization) ; usually, a few simple observables are monitored. Once the Markov 
chain is thcrmalized, the "measurements" can begin. The parameters in the 
simulation are chosen according to the following guidelines: 

• Choose e so that discretization errors are sufficiently small. 

• Choose A for an adequate acceptance rate. 

• Choose the number of sweeps A^swocps between measurements to achieve 
sufficiently small autocorrelations. 

• Choose the number of measurements iVmeas to achieve the desired pre- 
cision in the results. 

The results of some actual Monte Carlo computations using ute ~ 0.25 
with Nt = 1000 time slices are shown in Figs. 8-10. The Metropolis accep- 
tance rate is shown in the left-hand plot in Fig. 8. To get an acceptance 
rate around 0.5-0.6, one sees that A = 1.5 is a good choice for uje = 0.25. 
Autocorrelations were monitored using a typical observable chosen to be 
{u{tq + 5)u{tq)), where tq is taken near the midpoint between the path 
end-points (we actually averaged over a large number of tq values in the 
middle region for increased statistics). The autocorrelation function for this 
observable is shown in the right-hand plot of Fig. 8 (dashed line). One sees 
that about 100 sweeps are needed to reduce autocorrelations down to the 
level of 0.1. 
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Fig. 8. Left: Metropolis acceptance rate versus the parameter A for the simple har- 
monic oscillator with cue = 0.25. Right: Autocorrelation function associated with 
(u{tq + 5)x{ro)) against the number of Metropolis sweeps for updating the time slices 
Uj sequentially (solid line) and randomly (dashed). When updating the time slices in 
random order, a sweep refers to Nt successive local updates, where Nt is the number of 
time slices. 



Updating the path one Uj at a time in random order with each j-value 
being equally likely to be chosen ensures detailed balance and coverage of 
the entire integration volume. But a simpler method would be to just se- 
quentially sweep through the uj, updating each uj one at a time. With 
much less calls to the random number generator, sequential sweeps would 
take less computer time. Coverage of the entire integration volume is again 
ensured, but detailed balance is lost. However, detailed balance was use- 
ful only in that it ensured that the ergodic Markov chain had a unique 
fixed-point. Even though detailed balance is lost when sequentially sweep- 
ing through the time slices, the fixed-point condition is maintained. Thus, 
sequential sweeps are totally acceptable. When updating randomly chosen 
Uj, the Markov matrix P is the same for each local update. When updat- 
ing Uj sequentially, the Markov matrix is different for each local update. 
However, the Markov matrix for an entire sweep Pswcop, being the prod- 
uct of the Markov matrices for each time slice, is the same for each sweep. 
So the theoretical foundations described previously still apply, as long as 
we apply them to Pgweep- The autocorrelation function for the observable 
{u(to + 5)u{tq)) when updating time slices sequentially is shown as a solid 
line in the right-hand plot of Fig. 8. One sees that there is no adverse affect 
on the autocorrelations. 

Portions of paths produced in an actual Markov chain with sequential 
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Fig. 9. A few paths Uj for j = 400 — 550 with Nt = 1000 from an actual Monte Carlo 
simulation of the simple harmonic oscillator with Lue = 0.25. 

time-slice updating for oje = 0.25, Nt = 1000, A = 1.5 are shown in Fig. 9. 
Monte Carlo estimates for the correlation function {u{tq + t)u{to))) as a 
function of r are compared to exact results in Fig. 10. 

5. Monte Carlo calculations in real scalar field theory in 
2+1 dimensions 

The action for a real scalar field in continuous Euclidean Z3-dimcnsional 
space-time in the imaginary time formalism is given by 
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0)8 = 0.25 
= 1000 
N = 40000 

mc 

N = 100 

sweeps 

A = 1.5 




Fig. 10. Monte Carlo estimates (circles) of the correlation function {u{to + r)n(ro)) 
are compared to the exact results (solid line) for the simple harmonic oscillator with 
oje = 0.25. 



Note that the action must be dimensionless in natural units ft = c = 1, 
and since m has units of a derivative 9^, that is, of a mass, then the field 
must have dimension [ip] = [mjs^^i, requiring that the coupling g has 
units [g] = [m]^~^. Thus, the coupling is dimensionless in 4 space-time 
dimensions, but has units of mass in 3 space-time dimensions, leaving g/m 
dimensionless. We require 5 > or the action will have no minimum. 

Quantization of this field theory can be accomplished using path inte- 
grals, but the notion of a "path" must be generalized: a path here is a field 
configuration in both space and time. The path integral now consists of in- 
tegrations over all field configurations. For a real scalar field, we now have 
an integral —00 < ip{x) < 00 at every space-time point x. The time-ordered 
two-point function is given by 

{T(p(Xl)(p(X2)} = TT— , 

J exp(-S'[(^]) 

which generalizes to n-point functions, time-ordered product of n fields, in 
a straightforward manner. 
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A Monte Carlo study requires an action on a space-time lattice. We will 
use an anisotropic cubic lattice with temporal lattice spacing at and spatial 
lattice spacing Og. Discretization of the action is achieved by replacing field 
derivatives by the simplest finite differences, and integrals over space-time 
by suitable summations over space-time lattice sites. The action is given by 

c D-i Y-/v^ ('i5(a^+%/i)-¥'(a;))2 1 2 / n2 , 5 / n4 1 



X 



where is the lattice spacing in the /i direction. Redefine the field 
\/aF~^ ifix) = a/2k7 (j){x), where tig IS cL dimcnsionless number, so the 
new field (^{x) is dimensionless. Then introduce a few more dimensionless 
parameters: 

as/at = C, A = — DTi' 

Ks(a^TO^ + 2C^ + 21) - 2) = f - 2A, k = C^s, 
to obtain the final form for the lattice action: 

5" = ^ f — ^ E 'P{x)'P{x+o.sj) - 2kC (/)(x)0(a;-t-a(f) 

2 I \j,/„\4 



+ (1 - 2A)0(.t)'^ + A<?!)(a;) 

The hopping parameter k essentially sets the mass parameter, and A > is 
the interaction strength. In what follows, we shall focus solely on the above 
theory in 3 space-time dimensions. 

5.1. Exact results in free field limit A = 

The free field theory A = is exactly solvable. In this case, the path integrals 
are multivariate gaussians. The free action can be written in the form 

xy 

For N lattice sites, M is a real and symmetric Nx N matrix having positive 
eigenvalues and given by 

M{x,y) = — ^ ^(^S{y,x + asj) + d{x,y + asj)^ 
^ i=i 

-2kC {S{y, x + ati) + S{x, y+ad)) + 2S{x, y). 
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The path integrals encountered in the free field theory can be evaluated 
using the so-called J-trick: use derivatives with respect to an external source 
Jfc, followed by the limit 0, to evaluate all integrals involving any 

number of products of the fields: 

( / ) (f>mi4'vi2 ■ ■ ■ i'mr exp{-^(j)jMjk(l)k) 

= lim — — — Y\_{ / d4>i) exp{-^(j)jMjk(t>k + Jn4>n), 

,7„^0 OJ„i^ O'Jmr \J — oo J 

lim T7 dct — exp (U.M^^J, 



This trick does the Wick contractions automagically! 

The two-point function is given by {T4>{xi)cj){x2)) = M~^{xi,X2)- The 
inverse of M can be obtained by the method of Green functions and using 
Fourier transforms. For an 

X Ly X Lf lattice, the result is 
M"i(x v) = ^ V cosjk-ix-y)) 

where fc^ = 27rn^i/L^ for = 0, 1, 2, . . . , — 1. The pole in the two-point 
function gives the energy Uf Ep of a single particle of momentum a^p: 



atEp = 2sinh ^ ( ^ -^a^m^ -|- 4 sin^ ( ^ a^Pa; ) + 4 sin^ ( i a^Py ) 



For small at, as, this becomes Ep = ^m? + ~^Py- The spectrum is the 
sum of free particle energies. 



5.2. Metropolis updating 

The Metropolis-Hastings method is useful only when the auxiliary proposal 
density leads to a reasonable acceptance rate, typically around 0.5. If we 
simultaneously change field values at all lattice sites, the value of the action 
most likely changes by a large amount, and the Metropolis-Hastings accep- 
tance probability plummets to zero. However, if we propose a change only to 
the field value on one site, a reasonable acceptance rate can be achieved. To 
maintain detailed balance and ensure coverage of the entire integration re- 
gion, the site to be updated should be chosen randomly, with each site being 
equally likely to be selected. However, as in the case of the simple harmonic 
oscillator, one finds that updating the fields at sites selected sequentially. 
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sweeping through the lattice, works just as well. Although detailed balance 
is lost, the crucial fixed-point condition is maintained, and by sequentially 
visiting every site, coverage of the entire integration region is ensured. 

Thus, we shall use an auxiliary proposal probability density that sweeps 
through the lattice, visiting each lattice site sequentially, updating each and 
every site one at a time. In the battle against autocorrelations, we expect 
that such a local updating scheme should be effective in treating the small 
wavelength modes of the theory, but the long wavelength modes may not 
be dealt with so well. 

Recall that the action is 

5" = - — ^ (l){x)(j){x+asj) - 2KCcj){x)(j){x+ati) 



+(1 - 2\)^{xY + X(j}{xy 

Define the neighborhood N{x) of the site x by 



If the field at the one site x is changed (t){x) <f>{x) + A, then the change 
in the action is 

SS = a(^N{x) + (A + 2(j){x)) (^1 + a((A + 20(x))A + 2{(j){x)^ ~ ^O)) ' 

This change in the action can also be written 

SS = A (ao + aiA + aaA^ + a^A^), 
ao = N{x) + 2(P{x)(l + 2X{(t){xf - 1)), 
ai = 1 + 2A(30(a;)2 - 1), 

02 = i\(t){x), 

03 = A. 

Single-site updates involve a single continuous real variable (j). A simple 
proposal density is then 

f:j^, -Ao<(0-0)<Ao, 

[o, |^-0|>Ao. 

In other words, a value for A is chosen randomly with uniform probability 
density in the range — Ao < A < Aq, and the proposed value for the field is 
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(f>{x) + A. The width Ao is chosen to obtain an acceptance rate around 50%. 
The proposed new value is accepted with probabihty min(l, exp(— 55*)). If 
rejected, the current field value is retained. This single-site procedure is 
repeated at every site on the lattice, sequentially sweeping through the 
lattice. 

5.3. Microcanonical updating 

When the single particle mass atmgap is small, the coherence length ^ = 
l/(at?Tigap) becomes large. The so-called continuum limit of the theory is 
reached when the coherence length is large compared to the lattice spacing, 
that is, when ^ — *■ oo. However, ^ ^ oo only occurs near a second order 
phase transition (a critical point). One finds that autocorrelations with the 
above Metropolis updating scheme become long ranged as ^ becomes large; 
this is known as critical slowing down. Autocorrelations are problematic 
even for ^ w 5 with the above Metropolis updating. In such cases, we will 
need to use some other procedure to better update the long wavelength 
modes. 

Long wavelength modes are associated with lower frequencies and lower 
energies. In other words, long-wavelength modes are associated with very 
small changes to the action. A possible way to improve autocorrelations is 
to make large but action preserving SS ^ changes to the field at one site. 
We shall refer to this as a microcanonical update, but such schemes are 
often referred to as overrelaxation in the literature. Local updating is so 
easy, we do not want to give up on it yet! In devising our microcanonical 
updating scheme, we must ensure that the fixed-point of our Markov chain 
is unaffected. Note that microcanonical updating cannot be used just by 
itself since it docs not explore the entire integration region. Microcanonical 
updating must be used in combination with sonic scheme that does cover the 
entire integration volume, such as the above-described Metropolis sweeps. 

To facilitate the discussion of such microcanonical updating, let us first 
revisit the Metropolis-Hastings method, examining the case of a sharply- 
peaked proposal probability density. Suppose f{4>) is a well-behaved, single- 
valued, invertiblc function, then consider a proposal density given by a 
Brcit-Wigncr peaked about f{(j>): 



where £ is a constant. Notice that this probability density is properly nor- 




60 



malized: 

/oo 
d(/) Rf{(l)<-(f>) = 1. 
-oo 

With such a proposal density, the standard choice for the acceptance prob- 
ability which satisfies detailed balance is given, as usual, by 



: mm 



mil, =mml,-7 { 

i?/(0<-0)7r(0)y \^ ((0-/(0))2+e2j^(0)y 



As e becomes very small, the proposal density becomes very sharply 
peaked about = /(<?!>)• In fact, we are interested in taking the limit e ^ 
to obtain a Dirac (5-function: 

6{x) = — lim — ^. 

The probability of proposing a value between f{(p) — e < (j) < f{(j)) + eis 
given by 



However, we need to consider a range of values which tends to a single value 
but for which the probability tends to unity as e — *■ 0. Clearly, a larger range 
is needed. The probability of proposing a value between — < < 
f{4>) + is 

/(</') + Ve _ _ 2 f I 

dcj) Rf{(j> <- (^) = - tan"^ 

which does tends to unity as e — > 0. If /(</>) is more than ^/s away from 0, 
then the probability that the transition is actually made is 

/(0)+v/? _ _ pfW+V^ _ _ 

d0 Wf{(f> ^ 0) = / deb Pacc(0 ^ ^ 0). 

Given that Rf(<j) ^ (j)) is always positive, the above integral is given by 

,2 _i / 1 \ 1 /•■^('^H^ ~ e ttU) 

mm — tan ^ ' ' ' -'-^ 



If we write (/) = + J/, then the remaining integral above becomes 
1 eii{S{c^)+y) 



TT 



({^'f{f{4>) + vW+e^y{ct>) 
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Now consider two cases: (a) .f{f{(f>)) ^ (f>, and (b) f{f{(f>)) ~ (p. 

For the first case when f{f{4>)) ^ 4>, since we are integrating over such 
a small range, the series expansion of the integrand about the central y = 
point should approximate the true integrand well, assuming no singularities. 
Performing this expansion about y = 0, then integrating, one finds a zero 
probability as e ^ 0, as long as 7r(/((/)))/7r((/)) is finite. To see this, begin 
by noting that the integral has the form 

£ /■'^, {ao + aiy + a2y'^ + ...) 

dy-. 



TT (e:2 + bo + biy + 62^^ + b^y^ 



where the aj,bj are constants. If the denominator does not become zero 
anywhere in the integration range, then the integral can be well approxi- 
mated by 



eao 



^(&o + J_ 
~ 7r(6o + £2) 



dy[ 1 + ci{e)y + C2{e)y^ + C3,{e)y^ + Ci{e)y^ + . 
(l + i£C2(£) + |£^C4(£) H ) ^ as £ ^ if 60 ^ 0, 



where the Cj{e) tend to finite constants as £ — > 0. 

For the second case when f{f{(t>)) ~ 4>, more care is needed when ex- 
panding about y ~ since the integral has the form 

^, {ao + aiy + a2y^ + ...) 

ay-, 



TT " (£2 -f 62^2 + 632/3 + 542^4 

To reproduce the integrand to a good approximation over the entire inte- 
gration range, the 622/2 term must be retained in the denominator, and the 
rest of the function can be expanded about y = Q: 

£ , ao /1 , "1 , 0.2 2 , f 0-3 ^3~\ 3 



7rJ_y^ (£2 + 622/ ) I ao ao yao £ 
For 62 > 0, then the result of the integration is 

2ao , - 
— r= tan 

7rV^2 

Hence, the acceptance probability is given in the limit £ ^ by 



' (/t) {l + ^iV^ + '^2£ + rf3£'/' + ---} 
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In this case, oq = 7r(/((/)))/7r((/)) and 62 = where the ' indicates 

the derivative function. If we differentiate both sides of f{f{4>)) = (p with 
respect to 0, we see that for a self-inverse function, 

l = ^(/(/('^))) =/'(/('^))/'(0), (1) 



so that for a self-inverse function, 



rum 



(self-inverse function). (2) 



Taking the limit e — > 0, we have a proposal density and an acceptance 
probability given by 

i?/(0^<^) = 5(0-/(0)), /(/(0)) = 0, (3) 



Now specialize to the case of microcanonical updating in which the self- 
inverse function /(0) reflects the field in some way so as to preserve the 
action. Let 5(0) denote that part of the action which involves the field at 
the site x being updated. If is the current value of the field at site x, then 
let /(0) denote another value of the field for which 5(0) = S{f{(j))) so that 
7r(/(0)) = 7r(0). For an infinitesimal change (f> —f (f> + S4>, we have 

5(0 + ,50) = 5(/(0 + 50)). 

Expanding both sides, 

5(0) + 5'(0),50 + O(<502) = 5(/(0) + /'(0)<50 + O(,50^)) 

= 5(/(0)) + 5'(/(0)) /'(0),50 + O(<502) 
= 5(0) + 5'(/(0)) /'(0)J0 + O(J02). 

Solving this equation order by order in 50 leads to 

5'(0) 



5'(0) = 5'(/(0)) /'(0) ^ /'(0) 



Hence, 

S'{f{4>)) 5'(/(0)) 



/'(/W) 



5'(/(/(0))) 5'(0) 



63 



So the proposal and acceptance probability densities are 



( 



1 



) 



fifm 



S{f {<!>)) = 5(0), 



exp(-5[0]) 



mm 



/ 



V(j) cxp{-S[(j)]) 



So far, we have examined only the case of applying a single self-inverse 
function which leaves the action invariant. In the (/)^ field theory, the ac- 
tion preserving equation SS = will often have four solutions. In other 
words, more than one self-inverse action-preserving function are possible. 
The simplest way to proceed is to randomly pick one of the self-inverse 
action-preserving functions fj{4>) with equal probabilities, then apply the 
accept- reject condition on the resulting field value fj{4>)- 

The above procedure always proposes a change to the field, then an 
accept-reject step is applied. A straightforward modification of the above 
procedure is to not always propose a change. The above method can be 
generalized to include a probability /i of proposing a change; we will find 
that sometimes we will need /i < 1 to prevent (damped) oscillations in the 
autocorrelation function. 

The summary of our microcanonical updating process is as follows: 

(1) Decide to propose a new field value with probability fi. If the random 
decision is to retain the current field value, the steps below can be 
skipped. 

(2) Given initial value of the field at site x, solve SS{(I)) = 0. Let (pj 
denote the real solutions which differ from </>. These will be the roots 
of a cubic polynomial. Sometimes there will be three such real distinct 
solutions, other times there will be only one. (The case of degenerate 
solutions is highly unlikely.) 

(3) With equal probability, randomly choose one of the (pj as the proposed 
new field value. Let denote the chosen value. 

(4) Accept this value with probability 




If rejected, the original value (p is retained. 



The above procedure is repeated for each site, sequentially sweeping 
through the lattice. All of the above formulas have assumed that 5'(0) ^ 
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T T 

Fig. 11. The autocorrelation function p(r) for (<I>(t)<J>(0)) witht = l/{2asm) and <3?{t) = 
<t>{^,y,t) using agiri = 0.10,0.25,0.50 and A = on 24^ isotropic lattices. The 
parameter t refers to the number of Metropolis sweeps. The right plot has a logarithmic 
vertical scale. 

and S'{4>j) 7^ 0. In the extremely unlikely case that any such mmima, max- 
uTia, or inflection points are encountered, simply retain the original field 
value and move on to the next site. 

5.4. Autocorrelations in the free field A = theory 

Let us first study autocorrelations in the free field A = theory. The auto- 
correlation function pij) for the observable (<i>(t)<I>(0)) with t = \/{2asm) 
and <I>(i) = '^^y (jji^TUit) is shown in Figs. 11-14 for various different up- 
dating schemes. The parameter r is the number of compound sweeps, and 
Qsm = 0.10,0.25,0.50 are used for A = on 24"^ isotropic lattices. 

In Fig. 11, each compound sweep is just one Metropolis sweep. One sees 
that nearly 2200 sweeps are needed to reduce autocorrelations down to 0.1 
for agm = 0.10. Fig. 12 shows the dramatic reduction in autocorrelations by 
including microcanonical sweeps in the updating. Each compound sweep in 
this figure is one Metropolis sweep, followed by one microcanonical sweep 
with probability /.i of proposing a change. The autocorrelation function has 
undesirable oscillations when /.j = 1 in the free field theory, as shown in 
the left hand plot in this figure. These oscillations can be removed by using 
/.J = 0.98, as shown in the right-hand plot of this figure. 

In Figs. 13 and 14, each compound sweep is one Metropolis, followed by 
Nfj^ microcanonical sweeps with probability fi of proposing a change. In the 
left-hand plot of Fig. 13, iV^ = 1 is used and /.j is varied. One sees that in 
the free scalar field theory, autocorrelations improve as p increases towards 
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Fig. 12. The autocorrelation function p(t) for (<J>{t)<l?(0)) with t = l/(2as m) and •^(t) = 
<t>{x, y,t) using asm = 0.10,0.25,0.50 and A = on 24^ isotropic lattices. The 
parameter t refers to the number of compound sweeps. Each compound sweep is one 
Metropolis sweep, followed by one microcanonical sweep with probability fj, of proposing 
a change. In the left plot, fi = 1, whereas fi = 0.98 in the right plot. 




Fig. 13. The autocorrelation function p(t) for (<I>{t)<3?(0)) with t = l/[2asm) and <3?{t) = 
't'i^j Vy t) using asm = 0.10 only and A = on 24'^ isotropic lattices. The parameter 
r refers to the number of compound sweeps. Each compound sweep is one Metropolis 
sweep, followed by A^^ microcanonical sweeps with probability /i of proposing a change. 
In the left plot, A^^ = 1 and fi is varied; in the right plot, fj, = 0.98 and A^^ is varied. 



unity, but /i = 1 introduces undesirable oscillations. Setting /i = 0.98 seems 
to be ideal. This value is used in the right-hand plot in this figure, and N^j, 
is varied. This plot shows autocorrelations improving as increases, but 
there are diminishing returns. As Nf^ increases, each compound sweep takes 
significantly more time, so this has to be weighed against the improvement 
in the autocorrelations. One finds that Nf^ = 3 seems optimal in the case 
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Fig. 14. The autocorrelation function p(r) for (<I>(t)<I>(0)) witlit = l/{2asm) and <J>{t) = 
Eaij, 4>(.^^y^'t) using asm = 0.25 (left plot) and a^m = 0.50 (right plot) and A = on 
24^ isotropic lattices. The parameter r refers to the number of compound sweeps. Each 
compound sweep is one Metropolis sweep, followed by Np^ microcanonical sweeps with 
probability ^ = 0.98 of proposing a change. 

of asm — 0.10. The autocorrelations for different with ji = 0.98 fixed 
arc also shown in Fig. 14 for a^m ~ 0.25 (left plot) and a^m = 0.50 (right 
plot). There is a dramatic improvement in going from = to = 1, 
but there is no further gain in increasing any further for these mass 
parameters. 



5.5. Extracting observables 

The stationary-state energies can be extracted from the asymptotic decay 
rates of temporal correlations of the fields. The temporal evolution of the 
field as a Heisenbcrg-picture quantum operator is 

and under certain general assumptions (an action satisfying both link and 
site reflection positivity) and ignoring temporal boundary conditions, then 
for t > 0, 

mmmO) = Y.{0\e"'me-"'\n){n\m\0), 



i\m\o) 



where a complete set of (discrete) eigenstates of H satisfying H\n) = Enln) 
has been inserted. Note that on a lattice with periodic boundary condi- 
tions, momentum is discrete, so the energy eigenstates are also discrete. If 
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(1|(/)(0)|0) 7^ 0, then Ai and Ei — Eq can be extracted as t becomes large, 
assuming (O|0(O)|O) = 0. One can use any operator 0{t) which is a function 
of the field (f>{t) only on a time slice t. The extraction of Ai and Ei — Eq is 
done using a correlated— fit: 

= E (C'(^) - M{t, a)) a-,} (C(t') - M(t', a)) , 
w 

where C{t) represents the Monte Carlo estimates of the correlation function 
with covariance matrix au' and the model function is M{t,a) = aie~"°*. 
The covariance matrix au' is determined using the standard Monte Carlo 
variance formula. To determine estimates of the model parameters ag and 
ai, one minimizes the above with respect to the model parameters. 
Uncertainties in the best-fit parameters ao = Ei — Eq and ai = Ai are 
usually obtained by a jackknife or bootstrap procedure (more on this in a 
moment). The fit must be done for a time range imin l£ t < tmax such 
that an acceptable fit quality is obtained, that is, x^/dof sa 1. A sum of 
two-exponentials as a model function can be used to minimize sensitivity 
to tminj but the fit parameters associated with faster-decaying exponential 
are generally not good estimates of the gap to the next energy level and 
should be discarded. 

The Monte Carlo method exploits the central limit theorem to deter- 
mine the statistical uncertainty in the covariance matrix au'. But how 
can one determine the uncertainty in the fit parameters ap and ai? These 
are not simple quantities that can be defined properly on a single path; 
the Monte Carlo variance formula cannot be easily applied. The use of 
resampling schemes solves this problem. Let (/) denote the Monte Carlo 
estimate of some quantity / using all elements in a Markov chain, for 
fc = 1, 2, . . . , A^, and let (/),/ denote the Monte Carlo estimate of / omitting 
Xj (so only the other — 1 Xk values are used) . The so-called jackknife 
error estimate in (/) is given by 

/ N \ 1/2 

1 ^ 1 

k=l k^.J 

Again, the Monte Carlo variance formula can be used to determine the 
covariance matrix att' for the correlation function itself in x^, and the 
jackknife method gives an estimate of the errors in the model fit parameters. 
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Another resampling scheme is the bootstrap. Again, let (/) denote the 
Monte Carlo estimate of some quantity / using all Xk for k = 1,2, . . . , N , 
and let (/){, denote the Monte Carlo estimate of / using a new set Xk, for 
k = 1,2, . . . , N , where each Xk is one of the original Xj chosen randomly 
with equal probability (a bootstrap sample). A given Xj can occur multiple 
times in the bootstrap sample. After one obtains a large number B of such 
estimates, then the following quantity (/) = {l/B)J2b=i{f)b is evaluated. 
The bootstrap error is given by 



For a given quantity, a plot of its probability distribution can be obtained 
from its B bootstrap estimates. 

A particularly good visual tool to see how well an energy can be ex- 
tracted is the so-called effective mass . For a correlator C {t) , the effective 
mass is defined by 



The value Ei — Eq is seen as a large-time plateau in the effective mass. 
Contributions from faster-decaying exponentials are seen as deviations of 
the effective mass from its asymptotic plateau value. A "good" operator 
with little coupling to higher-lying states results in a rapid onset of the 
plateau. Statistical noise generally grows with t. 

Extracting more than just the lowest energy in a symmetry channel 
requires a hermitian matrix of correlation functions Cij(t). Let Xn{t,to) 
denote the eigenvalues of C{to)~^^^ C{t) C(to)~^^^, where tg is some fixed 
reference time. These eigenvalues can be viewed as principal correlators. 
Assume that they are ordered such that Ao > Ai > • • • as i becomes large, 





This is a function which tends to Ei — Eq a.s t becomes large: 
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then one can show that 

Hm Kit, to) = e-^"(*-*o) f 1 + ©(e-^-^*-*"))) , 

t— >oo V / 

A„ = min \Ek - En\. 

The effective masses associated with these principal correlators arc known 
as principal effective masses: 

{n)t,\ 1 ( A„(i,to) \ 
<-^'^-'^ Wt + a,M) )- 
For an X correlation matrix, these functions tend, as t becomes large, 
to the energies of the N lowest-lying states that couple to the operators 
whose correlations are being evaluated. Examples of such effective masses 
will be shown in the next section. 

5.6. Spectrum of the free field A = theory 

To determine the spectrum for the free-field case (A = 0) on an x Ny x Nt 
lattice, define 

<I>(i,n,,n^) = Y,cj){x,y,t) e2™/JV.+2^-„/iv„ . 
x-.y 

The lowest six levels having total zero momentum can be extracted using 
the following set of six operators: 

Oo(t) = $(i,0,0), 

Oi(t) -$(t,0,0) $(t,0,0), 

O2W = $(i,l,0) $(i,-l,0), 

OaW -$(i,0,l) <I>(i,0,-l), 

O4W = $(i,l,l) <!>(<, -1,-1), 

OsW = $(t,l,-l) $(i,-l,l). 

Principal effective masses from an actual Monte Carlo calculation using 
these operators in the A = scalar field theory are shown in Fig. 15. The 
six lowest-lying levels are extracted, and a 24^ x 48 isotropic lattice with 
asm = 0.25 is used. In this figure, the Monte Carlo results are compared 
with the known exact results: 0.24935 for the mass, 0.49871 for twice the 
mass, 0.71903 for the two states having minimal relative momenta, and 
0.88451 for the next two states. Note that the Monte Carlo calculation only 
gives results for energies in terms of a^^. To obtain energies in terms of 
MeV, it is necessary to set the value of a^^; this is known as setting the 
scale. Experimental input is needed for this. 
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Fig. 15. Principal effective masses from the 6x6 matrix of temporal correlations of the 
operators discussed in the text for the free field theory on a 24^ X 48 isotropic lattice 
with asm = 0.25. The left-hand plot shows the lowest-lying level, a single particle at 
rest. The right-hand plot shows the six lowest-lying levels. All energies agree with exactly 
determined values: 0.24935 for the single particle mass, 0.49871 for two particles at rest, 
0.71903 for the two states consisting of two particles having equal and opposite minimal 
momenta, and 0.88451 for the next two energies. 

5.7. The interacting theory 

We now introduce particle interactions by allowing the coupling A to be 
greater than zero. Negative values of A are not allowed as the energy would 
not be bounded from below. 

The autocorrelation function p(t) of (<i>(t)<I>(0)) for t ~ l/(2asTOgap) in 
the interacting theory is shown in Fig. 16 for two sets of k, A parameters and 
various values of A^^. In these figures, r refers to the number of compound 
sweeps, and each compound sweep is one Metropolis sweep, followed by Nfj_ 
microcanonical sweeps with probability ^ = 1 of proposing a change. In 
the left plot, t = 2at is used with k = 0.1930 and A = 0.300 on 24^ x 48 
isotropic lattices and a^mgap ~ 0.25. In the right plot, t = bat is used with 
K = 0.1970 and A = 0.300 on 32^ x 96 isotropic lattices and a^mgap ^ 0.10. 
The microcanonical acceptance rate is about 80% in both cases. These 
plots again show how important the microcanonical sweeps are in reducing 
autocorrelations in the Markov chains. 

Single particle masses on 24"^ isotropic lattices are shown for various 
values of k, A in Fig. 17. One sees that the mass parameter m in the La- 
grangian is no longer the mass of the particle. This 2 + 1-dimensional <j>'^ 
theory has two phases separated by a line of critical points. For each value 
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Fig. 16. Autocorrelation function p{t) of the quantity (<I>(t)<I>(0)) for t ~ l/(2asm) and 
= X]it/ "/"(^^i y> *) against the number of compound sweeps r, each consisting of one 
Metropohs sweep followed by microcanonical sweeps with probability fi of proposing 
a change at each site. In both plots, fi = 1.00 and Nfj, is varied, (a) In the left-hand 
plot, t = 2at is used with k = 0.1930 and A = 0.300 on a 2A^ X 48 isotropic lattice such 
that the mass gap is about 0.25; the microcanonical acceptance rate is 0.807. (b) In the 
right-hand plot, t = 5at is used with k = 0.1970 and A = 0.300 on a 32^ X 96 isotropic 
lattice so that the mass gap is about 0.10; the microcanonical acceptance rate is 0.804. 




Fig. 17. The mass gap OsTTigap in 2 -|- 1-dimensional if>'^ theory on 24^ isotropic lattices 
for various values of the hopping parameter k and the coupling A. 



of A, there exists a critical value Kc(A) at which the mass gap goes to zero. 
The so-called symmetric phase occurs for k < Kc(A); in this phase, the 
— > —0 symmetry holds, and {(j)) = 0. The so-called broken phase occurs 
for K, > Kc(A); in this phase, the symmetry (jj — *■ —(jj of the Lagrangian 
is spontaneously broken, such that there is a nonzero vacuum expectation 
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Fig. 18. The phase boundary for 2+1-dimcnsional theory, shown as a critical value 
Kc{X) of the hopping parameter as a function of the coupling A. The symmetry <j) — > —<!> 
is spontaneously broken above the phase boundary. The theory becomes the Ising model 
as A — > oo. 

value for the field: {(f)) ^ Q. A somewhat qualitative sketch of the phase 
boundary is shown in Fig. 18. 

Although the physics of this field theory are very interesting, the focus of 
these lectures is the Monte Carlo method, which we have already discussed 
in great detail. In these six lectures, there is inadequate time to study the 
renormalization group flows of this model, nor to describe the calculational 
techniques needed to probe the phase transition. Phase transitions only 
occur in systems having an infinite number of degrees of freedom, requiring 
an infinite number of lattice sites. Monte Carlo studied must be performed 
with a finite number of lattice sites out of necessity, so the study of the phase 
transition requires a few clever tricks. The interested reader is invited to 
further explore this field theory after these lectures. 

6. Monte Carlo calculations in lattice quantum 
chromodynamics 

The field of lattice QCD began with the famous paper of Ken Wilson in 
1974.^^ Wilson found a way of formulating QCD on a hypercubic space- 
time lattice which preserved local gauge invariance, an important property 
linked to the renormalizability of the theory. In lattice QCD, the quarks 
reside on the sites, while the gluon field resides on the links between sites. 



n I I I I I I I I I I I I I I I I I n 
broken phase 



symmetric phase 



_] I I I I I I I I I I I I I I I I I L. 
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Wilson advocated the use of link variables Uf^{x) which are path-ordered 
exponentials of the gauge field from one site to its neighbor. In other words, 
the Ufi{x) are parallel transport matrices. In QCD, the link variables are 
elements of the group SU{2>). For gluons, the path integration involves an 
eight-dimensional integral on each link, so the path integral has dimension 
32NxNyNzNt. For a 24"* lattice, the path integral has a dimension near 10.6 
million. Of course, the fermion quark fields have to be dealt with, too. The 
quark fields are Grassmann valued and obey Fermi-Dirac statistics. This 
introduces major complications into the Monte Carlo updating procedure. 

Current Monte Carlo updating methods in pure gauge theories work 
very well. The best methods are similar to that already described in the (j)'^ 
theory. One sweeps through the lattice using either a Metropolis or a heat 
bath local updating scheme, then sweeps some number of times with an 
action-preserving local updating method. In some field theories, it is possi- 
ble to sample the probability density associated with a local update, either 
using a transformation method or the rejection method. Such a local updat- 
ing method is called a heat bath.^^ A heat bath method for SU{2) lattice 
gauge theory was proposed in Ref. 13 and was later improved in Ref. 14. A 
heat bath method for SU{3) exists, but a more efficient pseudo-hcatbath 
method^^ of updating an SU{3) matrix by successive SU{2) subgroups is 
usually used. Microcanonical (ovcrrclaxation) updating of SU (2) subgroups 
was proposed in Ref. 17, and a microcanonical procedure for SU{3) is de- 
scribed in Ref. 18. Local microcanonical updating algorithms for general 
SU{N) gauge theories have been devised in Refs. 19,20. 

Monte Carlo updating methods including quarks in lattice QCD are 
steadily improving. The methods currently used are based on the so-called 
Hybrid Monte Carlo^-'^ method and a variant known as RHMC.^^ Fermions 
present special challenges in terms of formulating them on a lattice and car- 
rying out Monte Carlo calculations. Unfortunately, since my time is nearly 
up, I will not be able to discuss these issues any further here. 

Before concluding, I would like to present the results of two Monte Carlo 
studies in lattice gauge theory and lattice QCD that I have been personally 
involved in. An amazing feature of nonabelian gauge theories is that the 
massless gluons can bind to form rather massive objects known as glueballs. 
The analog of such particles in electromagnetism would be massive globules 
of pure light! The mass spectrum of such objects from Ref. 23 is shown in 
Fig. 19. States are labeled by J^*^, where J is the spin, P is the parity, 
and C is the charge conjugation quantum number. The scale has been set 
using Tq"^ = 410(20) MeV, where rg is a convenient hadron scale defined 
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Fig. 19. The mass spectrum of glucballs in tiic pure SU{3) gauge theory from Ref 23. 
The masses are given in terms of the hadronic scale rg along the left vertical axis and 
in terms of GeV along the right vertical axis (assuming r^^ = 410 MeV). The mass 
uncertainties indicated by the vertical extents of the boxes do not include the uncertainty 
in setting rg. The locations of states whose interpretation requires further study are 
indicated by the dashed hollow boxes. 



in terms of the static quark potential. These results were computed using 
Cabibbo-Marinari pseudo-heatbath and Creutz microcanonical overrelax- 
ation. A 24 X 24 correlation matrix was used in each symmetry channel. 
The mass of the lowest-lying scalar glueball is the only particle mass that 
I know of that has a bounty on its head: the Clay Mathematics Institute 
will pay $1 million for a mathematical proof that this mass is nonzero. 

I am currently a member of a collaboration of lattice QCD theorists 
known as the Lattice Hadron Physics Collaboration (LHPC). One of our 
current goals is to use the Monte Carlo method to predict the baryon and 
meson spectrum of QCD. Our plans on how we intend to accomplish this are 
outlined in Ref. 25. We have recently completed exploratory studies^'' on 
small anisotropic lattices in the quenched approximation to QCD (ignoring 
quark loops) at relatively large quark masses. These studies demonstrated 
our ability to isolate up to nine energy eigenvalues from our correlation 
functions. The spectrum of isospin 7=1/2 states, albeit at an unphysically 
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Fig. 20. (Left) An exploratory first computation of the nucleon spectrum from 200 
quenched configurations on a 12'^ X 48 anisotropic lattice using the Wilson gauge and 
quark actions with as ~ 0.1 fm, as /at ~ 3.0 and m.,^ ~ 700 MeV. The results are from 
Ref. 24. (Right) The currently known spectrum from experiment. Black denotes four-star 
states, blue denotes three-star states, tan denotes two-star states, and gray denotes a 
one-star state. 



large quark mass corresponding to m^r — 700 McV, is shown as the left-hand 
panel in Fig. 20. The right-hand panel shows the experimental spectrum, 
with states assigned according to the irreducible representations of the cubic 
group; even in this quenched calculation, at unphysical pion masses, there 
are tantalizing suggestions of the existence of a band of negative-parity 
states well separated from the higher excitations, as observed experimen- 
tally. These calculations are ongoing. 



7. Conclusion 

This series of six lectures was an introduction to using the Monte Carlo 
method to carry out nonperturbative studies in quantum field theories. 
First, the path integral method in nonrelativistic quantum mechanics was 
reviewed and several simple examples were studied: a free particle in one 
dimension, the one-dimensional infinite square well, a free particle in one di- 
mension with periodic boundary conditions, and the one-dimensional simple 
harmonic oscillator. The extraction of observables from correlation func- 
tions or vacuum expectation values was discussed, and the evaluation of 
these correlation functions using ratios of path integrals was described, in- 
troducing Wick rotations to imaginary time. 
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A major portion of this series of lectures was then devoted to the eval- 
uation of path integrals in the imaginary time formalism using the Monte 
Carlo method with Markov chains. After a brief review of probability the- 
ory, the law of large numbers and the central limit theorem were used to 
justify simple Monte Carlo integration. However, the estimation of path 
integrals with sufficient accuracy required the need for clever importance 
sampling, which led to the use of stationary stochastic processes, and in par- 
ticular, ergodic Markov chains. Several properties of Markov chains were 
discussed and proven, particularly the fundamental limit theorem for er- 
godic Markov chains. The Metropolis-Hastings method of constructing a 
Markov chain was described. 

Next, the one-dimensional simple harmonic oscillator was studied us- 
ing the Metropolis method. One of the simplest quantum field theories, a 
real scalar field theory with a interaction in three space-time dimen- 
sions, was then studied. The Metropolis method was seen to be plagued 
by strong autocorrelations. An action-preserving (microcanonical) updat- 
ing method was described, and its effectiveness in reducing autocorrelations 
was demonstrated. The extraction of stationary-state energies was detailed, 
introducing correlated-^^ fitting, as well as jackknife and bootstrap error 
estimates. The lectures concluded with some comments and results from 
lattice QCD. 

In addition to the references already cited, see Ref. 26-28 for an intro- 
duction to probability, Refs. 29,30 for more details concerning stochastic 
processes, and Ref. 31 for a very good textbook on quantum fields on a 
lattice. 

This work was supported by the U.S. National Science Foundation 
through grant PHY-0354982. I would especially like to thank Robert Ed- 
wards, Jimmy Juge, Julius Kuti, Adam Lichtl, Mike Peardon, and David 
Richards for their helpful comments and suggestions. 
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